工商及服務業普查資料品質之研究

Publications-Theses

Article View/Open

html(267)

Publication Export

Google Scholar^TM

題名	工商及服務業普查資料品質之研究 Data quality research of industry and commerce census
作者	邱詠翔
貢獻者	鄭宇庭<br>蔡紋琦邱詠翔
關鍵詞	資料品質事後分層抽樣產業創新調查工商及服務業普查資料清理與整理 Data Quality Post-Stratified Sampling Industrial Innovation Survey Industry and Commerce Census Data Cleaning and Consolidation
日期	2010
上傳時間	29-Sep-2011 16:46:18 (UTC+8)
摘要	資料品質的好壞會影響決策品質以及各種行動的執行成果，所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫，一個是產業創新調查資料庫，一個是95年工商及服務業普查資料庫，資料品質的好壞對一個資料庫來說也是一個相當重要的議題，資料庫中往往都含有錯誤的資料，錯誤的資料會導致分析結果出現偏差的狀況，所以在進行資料分析之前，資料清理與整理是必要的事前處理工作。我們從母體資料分佈與樣本資料分佈得知，在清理與整理資料之前，平均創新員工人數為92.08，平均工商員工人數為135.54；在清理與整理資料之後，我們比較兩個資料庫員工人數的相關性、相似性、距離等性質，結果顯示兩個資料庫的資料一致性極高，平均創新員工人數與平均工商員工人數分別為39.01與42.12，跟母體平均員工人數7.05較為接近，也顯示出資料清理的重要性。本研究使用的方法為事後分層抽樣，主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況，推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊，而工商及服務業普查企業資料為一般企業母體底冊。因此，我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證，發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。 Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
參考文獻	中文參考文獻中華市場研究協會，2009，行政院主計處委託研究:工商及服務業普查抽樣方法效能之研究。行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告，2009。呂朝賢，2005，由資料品質談家庭收支調查在社福議題的運用，社區發展季刊第111期。余清祥、胡玉蕙，1999，從美國經驗探討抽樣在普查之新角色，主計月刊第522期:60-66。李念秋，2002，資料品質改善之研究:錯誤資料偵測技術之發展與評估，國立中山大學資訊管理研究所碩士論文。李盼，2010，政府統計數據質量的實證檢驗分析，江蘇大學財經學院。吳聲和，2010，美國工商業母體資料庫及經濟普查報告，行政院主計處。郭志懋、周傲英，2002，數據質量和數據清洗研究綜述，軟件學報第13期。黃于玲、周元暉，2005，荷蘭2001年虛擬普查簡介，中國統計通訊第16期:2-8。鄭雍瑋，2006，中文資訊擷取結果之錯誤偵測，國立政治大學資訊科學研究所碩士論文。顏貝珊，2010，2010年各國人口普查制度之研究，人口學刊第40期:203-229。英文參考文獻 Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266. English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518. Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics: A Festscrift for Terry Speed, 40, pp.197-230. Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report. Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519. Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269. Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York. Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209. Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35. Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report. Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc. Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann. Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57. Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65. 相關網站: 中華民國統計資訊網，URL: http://www.stat.gov.tw
描述	碩士國立政治大學統計研究所 98354020 99
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0098354020
資料類型	thesis

dc.contributor.advisor	鄭宇庭<br>蔡紋琦	zh_TW
dc.contributor.author (Authors)	邱詠翔	zh_TW
dc.creator (作者)	邱詠翔	zh_TW
dc.date (日期)	2010	en_US
dc.date.accessioned	29-Sep-2011 16:46:18 (UTC+8)	-
dc.date.available	29-Sep-2011 16:46:18 (UTC+8)	-
dc.date.issued (上傳時間)	29-Sep-2011 16:46:18 (UTC+8)	-
dc.identifier (Other Identifiers)	G0098354020	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/50810	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計研究所	zh_TW
dc.description (描述)	98354020	zh_TW
dc.description (描述)	99	zh_TW
dc.description.abstract (摘要)	資料品質的好壞會影響決策品質以及各種行動的執行成果，所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫，一個是產業創新調查資料庫，一個是95年工商及服務業普查資料庫，資料品質的好壞對一個資料庫來說也是一個相當重要的議題，資料庫中往往都含有錯誤的資料，錯誤的資料會導致分析結果出現偏差的狀況，所以在進行資料分析之前，資料清理與整理是必要的事前處理工作。我們從母體資料分佈與樣本資料分佈得知，在清理與整理資料之前，平均創新員工人數為92.08，平均工商員工人數為135.54；在清理與整理資料之後，我們比較兩個資料庫員工人數的相關性、相似性、距離等性質，結果顯示兩個資料庫的資料一致性極高，平均創新員工人數與平均工商員工人數分別為39.01與42.12，跟母體平均員工人數7.05較為接近，也顯示出資料清理的重要性。本研究使用的方法為事後分層抽樣，主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況，推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊，而工商及服務業普查企業資料為一般企業母體底冊。因此，我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證，發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。	zh_TW
dc.description.abstract (摘要)	Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.	en_US
dc.description.tableofcontents	第壹章緒論 1 第一節研究背景與動機 1 第二節研究目的 2 第三節研究對象 3 第四節研究架構 5 第貳章文獻探討 6 第一節 95年工商普查概述 6 第二節產業創新調查 9 第三節資料品質 12 第四節相關文獻方法探討 18 第參章研究方法 23 第一節研究步驟 23 第二節資料分析方法 23 第肆章研究結果 32 第一節母體資料分佈 32 第二節樣本資料分佈 34 第三節資料分析 37 第四節事後分層抽樣推估母體 50 第伍章結論與建議 64 第一節結論 64 第二節建議與未來研究方向 66 參考文獻 67 附錄 70	zh_TW
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0098354020	en_US
dc.subject (關鍵詞)	資料品質	zh_TW
dc.subject (關鍵詞)	事後分層抽樣	zh_TW
dc.subject (關鍵詞)	產業創新調查	zh_TW
dc.subject (關鍵詞)	工商及服務業普查	zh_TW
dc.subject (關鍵詞)	資料清理與整理	zh_TW
dc.subject (關鍵詞)	Data Quality	en_US
dc.subject (關鍵詞)	Post-Stratified Sampling	en_US
dc.subject (關鍵詞)	Industrial Innovation Survey	en_US
dc.subject (關鍵詞)	Industry and Commerce Census	en_US
dc.subject (關鍵詞)	Data Cleaning and Consolidation	en_US
dc.title (題名)	工商及服務業普查資料品質之研究	zh_TW
dc.title (題名)	Data quality research of industry and commerce census	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	中文參考文獻	zh_TW
dc.relation.reference (參考文獻)	中華市場研究協會，2009，行政院主計處委託研究:工商及服務業普查抽樣方法效能之研究。	zh_TW
dc.relation.reference (參考文獻)	行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告，2009。	zh_TW
dc.relation.reference (參考文獻)	呂朝賢，2005，由資料品質談家庭收支調查在社福議題的運用，社區發展季刊第111期。	zh_TW
dc.relation.reference (參考文獻)	余清祥、胡玉蕙，1999，從美國經驗探討抽樣在普查之新角色，主計月刊第522期:60-66。	zh_TW
dc.relation.reference (參考文獻)	李念秋，2002，資料品質改善之研究:錯誤資料偵測技術之發展與評估，國立中山大學資訊管理研究所碩士論文。	zh_TW
dc.relation.reference (參考文獻)	李盼，2010，政府統計數據質量的實證檢驗分析，江蘇大學財經學院。	zh_TW
dc.relation.reference (參考文獻)	吳聲和，2010，美國工商業母體資料庫及經濟普查報告，行政院主計處。	zh_TW
dc.relation.reference (參考文獻)	郭志懋、周傲英，2002，數據質量和數據清洗研究綜述，軟件學報第13期。	zh_TW
dc.relation.reference (參考文獻)	黃于玲、周元暉，2005，荷蘭2001年虛擬普查簡介，中國統計通訊第16期:2-8。	zh_TW
dc.relation.reference (參考文獻)	鄭雍瑋，2006，中文資訊擷取結果之錯誤偵測，國立政治大學資訊科學研究所碩士論文。	zh_TW
dc.relation.reference (參考文獻)	顏貝珊，2010，2010年各國人口普查制度之研究，人口學刊第40期:203-229。	zh_TW
dc.relation.reference (參考文獻)	英文參考文獻	zh_TW
dc.relation.reference (參考文獻)	Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen.	zh_TW
dc.relation.reference (參考文獻)	Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266.	zh_TW
dc.relation.reference (參考文獻)	English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518.	zh_TW
dc.relation.reference (參考文獻)	Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics:	zh_TW
dc.relation.reference (參考文獻)	A Festscrift for Terry Speed, 40, pp.197-230.	zh_TW
dc.relation.reference (參考文獻)	Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report.	zh_TW
dc.relation.reference (參考文獻)	Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519.	zh_TW
dc.relation.reference (參考文獻)	Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269.	zh_TW
dc.relation.reference (參考文獻)	Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York.	zh_TW
dc.relation.reference (參考文獻)	Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209.	zh_TW
dc.relation.reference (參考文獻)	Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35.	zh_TW
dc.relation.reference (參考文獻)	Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report.	zh_TW
dc.relation.reference (參考文獻)	Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc.	zh_TW
dc.relation.reference (參考文獻)	Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann.	zh_TW
dc.relation.reference (參考文獻)	Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57.	zh_TW
dc.relation.reference (參考文獻)	Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65.	zh_TW
dc.relation.reference (參考文獻)	相關網站:	zh_TW
dc.relation.reference (參考文獻)	中華民國統計資訊網，URL: http://www.stat.gov.tw	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM