學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 工商及服務業普查資料品質之研究
Data quality research of industry and commerce census
作者 邱詠翔
貢獻者 鄭宇庭<br>蔡紋琦
邱詠翔
關鍵詞 資料品質
事後分層抽樣
產業創新調查
工商及服務業普查
資料清理與整理
Data Quality
Post-Stratified Sampling
Industrial Innovation Survey
Industry and Commerce Census
Data Cleaning and Consolidation
日期 2010
上傳時間 29-Sep-2011 16:46:18 (UTC+8)
摘要 資料品質的好壞會影響決策品質以及各種行動的執行成果,所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫,一個是產業創新調查資料庫,一個是95年工商及服務業普查資料庫,資料品質的好壞對一個資料庫來說也是一個相當重要的議題,資料庫中往往都含有錯誤的資料,錯誤的資料會導致分析結果出現偏差的狀況,所以在進行資料分析之前,資料清理與整理是必要的事前處理工作。
     
      我們從母體資料分佈與樣本資料分佈得知,在清理與整理資料之前,平均創新員工人數為92.08,平均工商員工人數為135.54;在清理與整理資料之後,我們比較兩個資料庫員工人數的相關性、相似性、距離等性質,結果顯示兩個資料庫的資料一致性極高,平均創新員工人數與平均工商員工人數分別為39.01與42.12,跟母體平均員工人數7.05較為接近,也顯示出資料清理的重要性。
     
      本研究使用的方法為事後分層抽樣,主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況,推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊,而工商及服務業普查企業資料為一般企業母體底冊。因此,我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證,發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。
Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary.
     
     We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning.
     
     Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
參考文獻 中文參考文獻
中華市場研究協會,2009,行政院主計處委託研究:工商及服務業普查抽樣方法 效能之研究。
行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告,2009。
呂朝賢,2005,由資料品質談家庭收支調查在社福議題的運用,社區發展季刊第111期 。
余清祥、胡玉蕙,1999,從美國經驗探討抽樣在普查之新角色,主計月刊第522期:60-66。
李念秋,2002,資料品質改善之研究:錯誤資料偵測技術之發展與評估,國立中山大學資訊管理研究所碩士論文。
李盼,2010,政府統計數據質量的實證檢驗分析,江蘇大學財經學院。
吳聲和,2010,美國工商業母體資料庫及經濟普查報告,行政院主計處。
郭志懋、周傲英,2002,數據質量和數據清洗研究綜述,軟件學報第13期。
黃于玲、周元暉,2005,荷蘭2001年虛擬普查簡介,中國統計通訊第16期:2-8。
鄭雍瑋,2006,中文資訊擷取結果之錯誤偵測,國立政治大學資訊科學研究所碩士論文。
顏貝珊,2010,2010年各國人口普查制度之研究,人口學刊第40期:203-229。
英文參考文獻
Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen.
Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266.
English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518.
Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics:
A Festscrift for Terry Speed, 40, pp.197-230.
Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report.
Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519.
Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269.
Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York.
Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209.
Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35.
Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report.
Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc.
Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann.
Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57.
Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65.
相關網站:
中華民國統計資訊網,URL: http://www.stat.gov.tw
描述 碩士
國立政治大學
統計研究所
98354020
99
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0098354020
資料類型 thesis
dc.contributor.advisor 鄭宇庭<br>蔡紋琦zh_TW
dc.contributor.author (Authors) 邱詠翔zh_TW
dc.creator (作者) 邱詠翔zh_TW
dc.date (日期) 2010en_US
dc.date.accessioned 29-Sep-2011 16:46:18 (UTC+8)-
dc.date.available 29-Sep-2011 16:46:18 (UTC+8)-
dc.date.issued (上傳時間) 29-Sep-2011 16:46:18 (UTC+8)-
dc.identifier (Other Identifiers) G0098354020en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/50810-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 98354020zh_TW
dc.description (描述) 99zh_TW
dc.description.abstract (摘要) 資料品質的好壞會影響決策品質以及各種行動的執行成果,所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫,一個是產業創新調查資料庫,一個是95年工商及服務業普查資料庫,資料品質的好壞對一個資料庫來說也是一個相當重要的議題,資料庫中往往都含有錯誤的資料,錯誤的資料會導致分析結果出現偏差的狀況,所以在進行資料分析之前,資料清理與整理是必要的事前處理工作。
     
      我們從母體資料分佈與樣本資料分佈得知,在清理與整理資料之前,平均創新員工人數為92.08,平均工商員工人數為135.54;在清理與整理資料之後,我們比較兩個資料庫員工人數的相關性、相似性、距離等性質,結果顯示兩個資料庫的資料一致性極高,平均創新員工人數與平均工商員工人數分別為39.01與42.12,跟母體平均員工人數7.05較為接近,也顯示出資料清理的重要性。
     
      本研究使用的方法為事後分層抽樣,主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況,推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊,而工商及服務業普查企業資料為一般企業母體底冊。因此,我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證,發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。
zh_TW
dc.description.abstract (摘要) Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary.
     
     We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning.
     
     Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
en_US
dc.description.tableofcontents 第壹章 緒論 1
     第一節 研究背景與動機 1
     第二節 研究目的 2
     第三節 研究對象 3
     第四節 研究架構 5
     第貳章 文獻探討 6
     第一節 95年工商普查概述 6
     第二節 產業創新調查 9
     第三節 資料品質 12
     第四節 相關文獻方法探討 18
     第參章 研究方法 23
     第一節 研究步驟 23
     第二節 資料分析方法 23
     第肆章 研究結果 32
     第一節 母體資料分佈 32
     第二節 樣本資料分佈 34
     第三節 資料分析 37
     第四節 事後分層抽樣推估母體 50
     第伍章 結論與建議 64
     第一節 結論 64
     第二節 建議與未來研究方向 66
     參考文獻 67
     附錄 70
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0098354020en_US
dc.subject (關鍵詞) 資料品質zh_TW
dc.subject (關鍵詞) 事後分層抽樣zh_TW
dc.subject (關鍵詞) 產業創新調查zh_TW
dc.subject (關鍵詞) 工商及服務業普查zh_TW
dc.subject (關鍵詞) 資料清理與整理zh_TW
dc.subject (關鍵詞) Data Qualityen_US
dc.subject (關鍵詞) Post-Stratified Samplingen_US
dc.subject (關鍵詞) Industrial Innovation Surveyen_US
dc.subject (關鍵詞) Industry and Commerce Censusen_US
dc.subject (關鍵詞) Data Cleaning and Consolidationen_US
dc.title (題名) 工商及服務業普查資料品質之研究zh_TW
dc.title (題名) Data quality research of industry and commerce censusen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 中文參考文獻zh_TW
dc.relation.reference (參考文獻) 中華市場研究協會,2009,行政院主計處委託研究:工商及服務業普查抽樣方法 效能之研究。zh_TW
dc.relation.reference (參考文獻) 行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告,2009。zh_TW
dc.relation.reference (參考文獻) 呂朝賢,2005,由資料品質談家庭收支調查在社福議題的運用,社區發展季刊第111期 。zh_TW
dc.relation.reference (參考文獻) 余清祥、胡玉蕙,1999,從美國經驗探討抽樣在普查之新角色,主計月刊第522期:60-66。zh_TW
dc.relation.reference (參考文獻) 李念秋,2002,資料品質改善之研究:錯誤資料偵測技術之發展與評估,國立中山大學資訊管理研究所碩士論文。zh_TW
dc.relation.reference (參考文獻) 李盼,2010,政府統計數據質量的實證檢驗分析,江蘇大學財經學院。zh_TW
dc.relation.reference (參考文獻) 吳聲和,2010,美國工商業母體資料庫及經濟普查報告,行政院主計處。zh_TW
dc.relation.reference (參考文獻) 郭志懋、周傲英,2002,數據質量和數據清洗研究綜述,軟件學報第13期。zh_TW
dc.relation.reference (參考文獻) 黃于玲、周元暉,2005,荷蘭2001年虛擬普查簡介,中國統計通訊第16期:2-8。zh_TW
dc.relation.reference (參考文獻) 鄭雍瑋,2006,中文資訊擷取結果之錯誤偵測,國立政治大學資訊科學研究所碩士論文。zh_TW
dc.relation.reference (參考文獻) 顏貝珊,2010,2010年各國人口普查制度之研究,人口學刊第40期:203-229。zh_TW
dc.relation.reference (參考文獻) 英文參考文獻zh_TW
dc.relation.reference (參考文獻) Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen.zh_TW
dc.relation.reference (參考文獻) Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266.zh_TW
dc.relation.reference (參考文獻) English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518.zh_TW
dc.relation.reference (參考文獻) Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics:zh_TW
dc.relation.reference (參考文獻) A Festscrift for Terry Speed, 40, pp.197-230.zh_TW
dc.relation.reference (參考文獻) Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report.zh_TW
dc.relation.reference (參考文獻) Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519.zh_TW
dc.relation.reference (參考文獻) Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269.zh_TW
dc.relation.reference (參考文獻) Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York.zh_TW
dc.relation.reference (參考文獻) Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209.zh_TW
dc.relation.reference (參考文獻) Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35.zh_TW
dc.relation.reference (參考文獻) Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report.zh_TW
dc.relation.reference (參考文獻) Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc.zh_TW
dc.relation.reference (參考文獻) Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann.zh_TW
dc.relation.reference (參考文獻) Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57.zh_TW
dc.relation.reference (參考文獻) Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65.zh_TW
dc.relation.reference (參考文獻) 相關網站:zh_TW
dc.relation.reference (參考文獻) 中華民國統計資訊網,URL: http://www.stat.gov.twzh_TW