Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 趨近一般化資料倉儲與資料探勘之效能評估模型
Toward a More Generalized Benchmark Workload Model for Data Warehouse and Data Mining
作者 邱士涵
Chiu,Shih-Han
貢獻者 諶家蘭<br>季延平
Seng,J.L.<br>Chi,Y.P.
邱士涵
Chiu,Shih-Han
關鍵詞 資料倉儲
資料探勘
績效評估
工作量模式
data warehouse
data mining
performance evaluation
benchmark
workload model
日期 2006
上傳時間 14-Sep-2009 09:13:29 (UTC+8)
摘要 隨著網際網路的發達以及資料庫技術的成熟,人們取得資料變得非常的容易,再加上許多網際網路的應用其實就是一個自動化的資料收集工具,資料量之大已幾近爆炸的程度。資料倉儲便是一種用來儲存大量歷史資料的資料庫,提供彙整或是統計的資訊,以提供決策使用的資訊技術。而資料探勘是從大量的資料當中把對於決策過程中有幫助的規則找出來,提供給管理人員做為決策的參考,開創新的商業契機。資料倉儲的效能表現對於使用者的工作效率有著深遠的影響。因此有些用以衡量與預測資料倉儲之效能與效率之工作量模式便孕育而生,一般稱之為績效評估工具,然而目前所公佈的一般資料倉儲績效評估工具是針對特定範圍領域建構出某些典型的領域規格,並沒有一個使用者需求導向的資料倉儲績效評估工具。在資料探勘方面,探勘結果的準確度比起資料探勘所花費的時間來得重要,目前卻沒有一個有效、使用者需求導向的工具來評估資料探勘結果的準確度。我們針對資料倉儲的效能評估以及資料探勘準確度評估,設計一個以使用者需求為導向的工作量模型,來評估資料倉儲與資料探勘工具。
As growth of Internet and mature of database technology, people can get the data much easily than before. Many applications on Internet, in fact, are the tools of gather data automatically so that the amount of data is growing bigger and bigger. Data warehouse is one kind of database to store lots of historical data to offer statistical information for the information technology of decisions. Data mining is to find the useful rules for decisions from the amount of data to help the managers make decisions and create the new opportunities of business. The performance of data warehouse is import to user’s work efficiency. Therefore, there are some workload model arise to evaluate and predict the performance and efficiency of data warehouse called benchmark. However, the data warehouse specification announced these days are constructed to some typical domain specific, and the performance evaluation stand on synthetic workload. But, when the difference between the domain of data warehouse user applied and domain of performance evaluation tool is very large, the performance metric may different a lot to the result of benchmark tool. In data mining, the accuracy of mining result is important to business. The accuracy of mining result is more important than the time spend on data mining. However, there is no any useful tool to evaluate the accuracy of mining result and there is no any standard of performance criteria for data mining, either. We design a user requirement-oriented workload to evaluate performance of data warehouse and precision of data mining.
參考文獻 1.Inmon, W. H. (2002). Building the Data Warehouse, John Wiley & Sons, Inc., New York, NY.
2.Berry, M. J. A., & Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., New York, NY.
3.Han, J., & Kamber, M. (2000). Data Mining Concepts and Techniques, Morgan Kaufmann.
4.Fayyad, U. M., & Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, AAAAI/MIT Press.
5.Jose, S., Transaction Processing Performance Council (2002), TPC BenchmarkTM H Standard Specification Revision 2.1.0, 1993 - 2002 Transaction Processing Performance Council.
6.Jose, S., Transaction Processing Performance Council (2002), TPC BenchmarkTM R Standard Specification Revision 2.1.0, 1993 - 2002 Transaction Processing Performance Council.
7.Jose, S., Transaction Processing Performance Council (1998), TPC BenchmarkTM D Standard Specification Revision 2.1, 1993 - 1998 Transaction Processing Performance Council.
8.Frawley, W., Piatetsky-Shapiro, G., & Matheus C. (1992, Fall). Knowledge discovery in database: an overview. AI Magazine, 213-228.
9.Grupe, F., & Owrang, M. M. O. (1997). Database Tools to Acquire Knowledge for Rule-Based Systems, Information Software and Technology 39(9), 607-616.
10.Poess, M., & Floyd, C. (2000). New TPC Benchmarks for Decision Support and Web Commerce. ACM SIGMOD Record Volume 29(4), 64 – 71.
11.Hackman, S. T., Frazelle, E. H., Griffin, P. M., Griffin, S. O., & Vlasta D. A. (2001). Benchmarking Warehousing and Distribution Operations: An Input-Output Approach. Journal of Productivity Analysis, 16, 79–100.
12.Vassiliadis, P., Bouzeghoub, M., & Quiz, C. (2000). Towards Quality-oriented Data Warehouse Usage and Evolution. Information Systems 25(2), 89-l 15.
13.Pei, J., Mao, R., Hu, K., & Zhu, H. (2002). Towards Data Mining Benchmarking: A Test Bed for Performance Study of Frequent Pattern Mining. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.
14.Elnaffar, S., Martin, P., & Horman, R. (2002). Automatically Classifying Database. Paper presented at the meeting of the International Conference on Information and Knowledge Management.
15.Leutenegger, S. T., & Dias, D. (1993). A Modeling Study of The TPC-C Benchmark. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.
16.Gray, J. (1992). Database and Transaction Processing Benchmarks. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.
17.Doppelhammer, J., Hoppler, T., Kemper, A., & Kossmann, D. (1997). Database Performance in The Real World TPC-D and SAP R/3. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.
18.Poess, M., Smith, B., Kollar, L., & Larson, P. (2002). TPC-DS, Taking Decision Support Benchmarking to the Next Level. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.
19.Bhashyam, R. (1996). TPC-D - The Challenges, Issues and Results. Paper presented at the meeting of the International Conference on Very Large Data Bases.
20.Caruana, R. & NiculescuMizil, A. (2004, August). Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. Paper presented at the meeting of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
21.Vieira, M., & Madeira, H. (2003). A Dependability Benchmark for OLTP Application Environments. Paper presented at the meeting of the International Conference on Very Large Data Bases.
22.Zeller, B., & Kemper, A. (2004). Benchmarking SAP R/3 Archiving Scenarios. Paper presented at the meeting of the International Conference on Data Engineering.
23.Wasserman, T. J., Martin, P., Skillicorn, D. B., & Rizvi, H. (2004), Business Intelligence: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems. Paper presented at the meeting of the ACM international workshop on Data Warehousing and OLAP.
24.Fu, L., & Hammer, J. (2000). CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries. Paper presented at the meeting of the ACM international workshop on Data Warehousing and OLAP.
25.Poess, M., & Stephens, J. M. (2004). Generating Thousand Benchmark Queries in Seconds. Paper presented at the meeting of the International Conference on VLDB.
26.Gupta, A., Davis, K. C., & Grommon-Litton, J. (2002). Performance Comparison of Property Map and Bitmap Indexing. Paper presented at the meeting of the ACM International Workshop on Data Warehousing and OLAP.
27.Labio, W. J., Yang, J., Cui, Y., Garcia-Molina, H., & Widom, J. (2000). Performance Issues in Incremental Warehouse Maintenance, Paper presented at the meeting of the International Conference on VLDB.
28.Performance Study of Microsoft Data Mining Algorithms, Retrieved December 12, 2005 from http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/dmperf.mspx
29.Gartner (2004), Press Room, Quick Statistics. Retrieved June 1, 2004 from http://www.dataquest.com/press_gartner/quickstats/databases.html
30.IDC (2004), Worldwide Data Warehousing Tools 2004 Vendor Shares, September 2005, Retrieved March, 20, 2006 from http://www.idc.com
31.Gile, K. (2004), Forrester`s Business Technographics November 2004 North American And European Benchmark Study, Retrieved from March, 20, 2006 from http://www.forrester.com
描述 碩士
國立政治大學
資訊管理研究所
93356038
95
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0093356038
資料類型 thesis
dc.contributor.advisor 諶家蘭<br>季延平zh_TW
dc.contributor.advisor Seng,J.L.<br>Chi,Y.P.en_US
dc.contributor.author (Authors) 邱士涵zh_TW
dc.contributor.author (Authors) Chiu,Shih-Hanen_US
dc.creator (作者) 邱士涵zh_TW
dc.creator (作者) Chiu,Shih-Hanen_US
dc.date (日期) 2006en_US
dc.date.accessioned 14-Sep-2009 09:13:29 (UTC+8)-
dc.date.available 14-Sep-2009 09:13:29 (UTC+8)-
dc.date.issued (上傳時間) 14-Sep-2009 09:13:29 (UTC+8)-
dc.identifier (Other Identifiers) G0093356038en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/31084-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 93356038zh_TW
dc.description (描述) 95zh_TW
dc.description.abstract (摘要) 隨著網際網路的發達以及資料庫技術的成熟,人們取得資料變得非常的容易,再加上許多網際網路的應用其實就是一個自動化的資料收集工具,資料量之大已幾近爆炸的程度。資料倉儲便是一種用來儲存大量歷史資料的資料庫,提供彙整或是統計的資訊,以提供決策使用的資訊技術。而資料探勘是從大量的資料當中把對於決策過程中有幫助的規則找出來,提供給管理人員做為決策的參考,開創新的商業契機。資料倉儲的效能表現對於使用者的工作效率有著深遠的影響。因此有些用以衡量與預測資料倉儲之效能與效率之工作量模式便孕育而生,一般稱之為績效評估工具,然而目前所公佈的一般資料倉儲績效評估工具是針對特定範圍領域建構出某些典型的領域規格,並沒有一個使用者需求導向的資料倉儲績效評估工具。在資料探勘方面,探勘結果的準確度比起資料探勘所花費的時間來得重要,目前卻沒有一個有效、使用者需求導向的工具來評估資料探勘結果的準確度。我們針對資料倉儲的效能評估以及資料探勘準確度評估,設計一個以使用者需求為導向的工作量模型,來評估資料倉儲與資料探勘工具。zh_TW
dc.description.abstract (摘要) As growth of Internet and mature of database technology, people can get the data much easily than before. Many applications on Internet, in fact, are the tools of gather data automatically so that the amount of data is growing bigger and bigger. Data warehouse is one kind of database to store lots of historical data to offer statistical information for the information technology of decisions. Data mining is to find the useful rules for decisions from the amount of data to help the managers make decisions and create the new opportunities of business. The performance of data warehouse is import to user’s work efficiency. Therefore, there are some workload model arise to evaluate and predict the performance and efficiency of data warehouse called benchmark. However, the data warehouse specification announced these days are constructed to some typical domain specific, and the performance evaluation stand on synthetic workload. But, when the difference between the domain of data warehouse user applied and domain of performance evaluation tool is very large, the performance metric may different a lot to the result of benchmark tool. In data mining, the accuracy of mining result is important to business. The accuracy of mining result is more important than the time spend on data mining. However, there is no any useful tool to evaluate the accuracy of mining result and there is no any standard of performance criteria for data mining, either. We design a user requirement-oriented workload to evaluate performance of data warehouse and precision of data mining.en_US
dc.description.tableofcontents Chapter 1 Introduction
     1.1 Research Motivation
     1.2 Research Problem
     1.3 Research Objective
     1.4 Research Limitation
     1.5 Research Flow
     1.6 Organization of Thesis
     Chapter 2 Literature Review
     2.1 Data Warehouse and Data Mining
     2.2. Data Warehouse Benchmarks
     2.2.1 TPC-H and TPC-R
     2.2.2 TPC-DS
     2.2.3 Data Warehouse Benchmark Comparison
     2.3 Data Mining Benchmarks
     2.3.1 Microsoft SQL Server 2000 Data Mining Algorithms
     2.3.2 Precision Model
     2.3.3 Data Mining Benchmark Comparison
     Chapter 3 Research Method
     3.1 Research Structure
     3.2 Data Warehouse Data Model
     3.3 Data Warehouse Operation Model
     3.4 Data Mining Data Model
     3.5 Data Mining Computation Model
     3.6 Control model
     3.7 Performance Metrics
     3.8 Precision Metrics
     Chapter 4 Prototype Development
     4.1 Prototype Platform and Structure
     4.2. Prototype System Design
     4.3. Prototype System Implementation
     4.3.1 Data Generator
     4.3.2 Operation Selector
     4.3.3 Computation Selector
     4.3.4 Scheduler
     4.3.5 Result Collector
     Chapter 5 Research Experiment
     5.1 Experiment Design
     5.2 TPC-H Benchmark experiment
     5.2.1 Experiment Specification
     5.2.2 Experiment Results
     5.3 Microsoft SQL Server 2000 Data Mining Benchmark Experiment
     5.3.1 Experiment Specification
     5.3.2 Experiment Results
     Chapter 6 Research Discussion
     6.1 Managerial Findings
     6.2 Technical Findings
     Chapter 7 Conclusions and Future Research Directions
     7.1 Conclusions
     7.2 Suggestions for Future Researches
     References
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0093356038en_US
dc.subject (關鍵詞) 資料倉儲zh_TW
dc.subject (關鍵詞) 資料探勘zh_TW
dc.subject (關鍵詞) 績效評估zh_TW
dc.subject (關鍵詞) 工作量模式zh_TW
dc.subject (關鍵詞) data warehouseen_US
dc.subject (關鍵詞) data miningen_US
dc.subject (關鍵詞) performance evaluationen_US
dc.subject (關鍵詞) benchmarken_US
dc.subject (關鍵詞) workload modelen_US
dc.title (題名) 趨近一般化資料倉儲與資料探勘之效能評估模型zh_TW
dc.title (題名) Toward a More Generalized Benchmark Workload Model for Data Warehouse and Data Miningen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1.Inmon, W. H. (2002). Building the Data Warehouse, John Wiley & Sons, Inc., New York, NY.zh_TW
dc.relation.reference (參考文獻) 2.Berry, M. J. A., & Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., New York, NY.zh_TW
dc.relation.reference (參考文獻) 3.Han, J., & Kamber, M. (2000). Data Mining Concepts and Techniques, Morgan Kaufmann.zh_TW
dc.relation.reference (參考文獻) 4.Fayyad, U. M., & Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, AAAAI/MIT Press.zh_TW
dc.relation.reference (參考文獻) 5.Jose, S., Transaction Processing Performance Council (2002), TPC BenchmarkTM H Standard Specification Revision 2.1.0, 1993 - 2002 Transaction Processing Performance Council.zh_TW
dc.relation.reference (參考文獻) 6.Jose, S., Transaction Processing Performance Council (2002), TPC BenchmarkTM R Standard Specification Revision 2.1.0, 1993 - 2002 Transaction Processing Performance Council.zh_TW
dc.relation.reference (參考文獻) 7.Jose, S., Transaction Processing Performance Council (1998), TPC BenchmarkTM D Standard Specification Revision 2.1, 1993 - 1998 Transaction Processing Performance Council.zh_TW
dc.relation.reference (參考文獻) 8.Frawley, W., Piatetsky-Shapiro, G., & Matheus C. (1992, Fall). Knowledge discovery in database: an overview. AI Magazine, 213-228.zh_TW
dc.relation.reference (參考文獻) 9.Grupe, F., & Owrang, M. M. O. (1997). Database Tools to Acquire Knowledge for Rule-Based Systems, Information Software and Technology 39(9), 607-616.zh_TW
dc.relation.reference (參考文獻) 10.Poess, M., & Floyd, C. (2000). New TPC Benchmarks for Decision Support and Web Commerce. ACM SIGMOD Record Volume 29(4), 64 – 71.zh_TW
dc.relation.reference (參考文獻) 11.Hackman, S. T., Frazelle, E. H., Griffin, P. M., Griffin, S. O., & Vlasta D. A. (2001). Benchmarking Warehousing and Distribution Operations: An Input-Output Approach. Journal of Productivity Analysis, 16, 79–100.zh_TW
dc.relation.reference (參考文獻) 12.Vassiliadis, P., Bouzeghoub, M., & Quiz, C. (2000). Towards Quality-oriented Data Warehouse Usage and Evolution. Information Systems 25(2), 89-l 15.zh_TW
dc.relation.reference (參考文獻) 13.Pei, J., Mao, R., Hu, K., & Zhu, H. (2002). Towards Data Mining Benchmarking: A Test Bed for Performance Study of Frequent Pattern Mining. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) 14.Elnaffar, S., Martin, P., & Horman, R. (2002). Automatically Classifying Database. Paper presented at the meeting of the International Conference on Information and Knowledge Management.zh_TW
dc.relation.reference (參考文獻) 15.Leutenegger, S. T., & Dias, D. (1993). A Modeling Study of The TPC-C Benchmark. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) 16.Gray, J. (1992). Database and Transaction Processing Benchmarks. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) 17.Doppelhammer, J., Hoppler, T., Kemper, A., & Kossmann, D. (1997). Database Performance in The Real World TPC-D and SAP R/3. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) 18.Poess, M., Smith, B., Kollar, L., & Larson, P. (2002). TPC-DS, Taking Decision Support Benchmarking to the Next Level. Paper presented at the meeting of the ACM SIGMOD International Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) 19.Bhashyam, R. (1996). TPC-D - The Challenges, Issues and Results. Paper presented at the meeting of the International Conference on Very Large Data Bases.zh_TW
dc.relation.reference (參考文獻) 20.Caruana, R. & NiculescuMizil, A. (2004, August). Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. Paper presented at the meeting of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.zh_TW
dc.relation.reference (參考文獻) 21.Vieira, M., & Madeira, H. (2003). A Dependability Benchmark for OLTP Application Environments. Paper presented at the meeting of the International Conference on Very Large Data Bases.zh_TW
dc.relation.reference (參考文獻) 22.Zeller, B., & Kemper, A. (2004). Benchmarking SAP R/3 Archiving Scenarios. Paper presented at the meeting of the International Conference on Data Engineering.zh_TW
dc.relation.reference (參考文獻) 23.Wasserman, T. J., Martin, P., Skillicorn, D. B., & Rizvi, H. (2004), Business Intelligence: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems. Paper presented at the meeting of the ACM international workshop on Data Warehousing and OLAP.zh_TW
dc.relation.reference (參考文獻) 24.Fu, L., & Hammer, J. (2000). CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries. Paper presented at the meeting of the ACM international workshop on Data Warehousing and OLAP.zh_TW
dc.relation.reference (參考文獻) 25.Poess, M., & Stephens, J. M. (2004). Generating Thousand Benchmark Queries in Seconds. Paper presented at the meeting of the International Conference on VLDB.zh_TW
dc.relation.reference (參考文獻) 26.Gupta, A., Davis, K. C., & Grommon-Litton, J. (2002). Performance Comparison of Property Map and Bitmap Indexing. Paper presented at the meeting of the ACM International Workshop on Data Warehousing and OLAP.zh_TW
dc.relation.reference (參考文獻) 27.Labio, W. J., Yang, J., Cui, Y., Garcia-Molina, H., & Widom, J. (2000). Performance Issues in Incremental Warehouse Maintenance, Paper presented at the meeting of the International Conference on VLDB.zh_TW
dc.relation.reference (參考文獻) 28.Performance Study of Microsoft Data Mining Algorithms, Retrieved December 12, 2005 from http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/dmperf.mspxzh_TW
dc.relation.reference (參考文獻) 29.Gartner (2004), Press Room, Quick Statistics. Retrieved June 1, 2004 from http://www.dataquest.com/press_gartner/quickstats/databases.htmlzh_TW
dc.relation.reference (參考文獻) 30.IDC (2004), Worldwide Data Warehousing Tools 2004 Vendor Shares, September 2005, Retrieved March, 20, 2006 from http://www.idc.comzh_TW
dc.relation.reference (參考文獻) 31.Gile, K. (2004), Forrester`s Business Technographics November 2004 North American And European Benchmark Study, Retrieved from March, 20, 2006 from http://www.forrester.comzh_TW