dc.contributor.advisor | 鄭宇庭<br>蔡紋琦 | zh_TW |
dc.contributor.author (Authors) | 邱詠翔 | zh_TW |
dc.creator (作者) | 邱詠翔 | zh_TW |
dc.date (日期) | 2010 | en_US |
dc.date.accessioned | 29-Sep-2011 16:46:18 (UTC+8) | - |
dc.date.available | 29-Sep-2011 16:46:18 (UTC+8) | - |
dc.date.issued (上傳時間) | 29-Sep-2011 16:46:18 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0098354020 | en_US |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/50810 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 統計研究所 | zh_TW |
dc.description (描述) | 98354020 | zh_TW |
dc.description (描述) | 99 | zh_TW |
dc.description.abstract (摘要) | 資料品質的好壞會影響決策品質以及各種行動的執行成果,所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫,一個是產業創新調查資料庫,一個是95年工商及服務業普查資料庫,資料品質的好壞對一個資料庫來說也是一個相當重要的議題,資料庫中往往都含有錯誤的資料,錯誤的資料會導致分析結果出現偏差的狀況,所以在進行資料分析之前,資料清理與整理是必要的事前處理工作。 我們從母體資料分佈與樣本資料分佈得知,在清理與整理資料之前,平均創新員工人數為92.08,平均工商員工人數為135.54;在清理與整理資料之後,我們比較兩個資料庫員工人數的相關性、相似性、距離等性質,結果顯示兩個資料庫的資料一致性極高,平均創新員工人數與平均工商員工人數分別為39.01與42.12,跟母體平均員工人數7.05較為接近,也顯示出資料清理的重要性。 本研究使用的方法為事後分層抽樣,主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況,推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊,而工商及服務業普查企業資料為一般企業母體底冊。因此,我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證,發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。 | zh_TW |
dc.description.abstract (摘要) | Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high. | en_US |
dc.description.tableofcontents | 第壹章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 2 第三節 研究對象 3 第四節 研究架構 5 第貳章 文獻探討 6 第一節 95年工商普查概述 6 第二節 產業創新調查 9 第三節 資料品質 12 第四節 相關文獻方法探討 18 第參章 研究方法 23 第一節 研究步驟 23 第二節 資料分析方法 23 第肆章 研究結果 32 第一節 母體資料分佈 32 第二節 樣本資料分佈 34 第三節 資料分析 37 第四節 事後分層抽樣推估母體 50 第伍章 結論與建議 64 第一節 結論 64 第二節 建議與未來研究方向 66 參考文獻 67 附錄 70 | zh_TW |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0098354020 | en_US |
dc.subject (關鍵詞) | 資料品質 | zh_TW |
dc.subject (關鍵詞) | 事後分層抽樣 | zh_TW |
dc.subject (關鍵詞) | 產業創新調查 | zh_TW |
dc.subject (關鍵詞) | 工商及服務業普查 | zh_TW |
dc.subject (關鍵詞) | 資料清理與整理 | zh_TW |
dc.subject (關鍵詞) | Data Quality | en_US |
dc.subject (關鍵詞) | Post-Stratified Sampling | en_US |
dc.subject (關鍵詞) | Industrial Innovation Survey | en_US |
dc.subject (關鍵詞) | Industry and Commerce Census | en_US |
dc.subject (關鍵詞) | Data Cleaning and Consolidation | en_US |
dc.title (題名) | 工商及服務業普查資料品質之研究 | zh_TW |
dc.title (題名) | Data quality research of industry and commerce census | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 中文參考文獻 | zh_TW |
dc.relation.reference (參考文獻) | 中華市場研究協會,2009,行政院主計處委託研究:工商及服務業普查抽樣方法 效能之研究。 | zh_TW |
dc.relation.reference (參考文獻) | 行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告,2009。 | zh_TW |
dc.relation.reference (參考文獻) | 呂朝賢,2005,由資料品質談家庭收支調查在社福議題的運用,社區發展季刊第111期 。 | zh_TW |
dc.relation.reference (參考文獻) | 余清祥、胡玉蕙,1999,從美國經驗探討抽樣在普查之新角色,主計月刊第522期:60-66。 | zh_TW |
dc.relation.reference (參考文獻) | 李念秋,2002,資料品質改善之研究:錯誤資料偵測技術之發展與評估,國立中山大學資訊管理研究所碩士論文。 | zh_TW |
dc.relation.reference (參考文獻) | 李盼,2010,政府統計數據質量的實證檢驗分析,江蘇大學財經學院。 | zh_TW |
dc.relation.reference (參考文獻) | 吳聲和,2010,美國工商業母體資料庫及經濟普查報告,行政院主計處。 | zh_TW |
dc.relation.reference (參考文獻) | 郭志懋、周傲英,2002,數據質量和數據清洗研究綜述,軟件學報第13期。 | zh_TW |
dc.relation.reference (參考文獻) | 黃于玲、周元暉,2005,荷蘭2001年虛擬普查簡介,中國統計通訊第16期:2-8。 | zh_TW |
dc.relation.reference (參考文獻) | 鄭雍瑋,2006,中文資訊擷取結果之錯誤偵測,國立政治大學資訊科學研究所碩士論文。 | zh_TW |
dc.relation.reference (參考文獻) | 顏貝珊,2010,2010年各國人口普查制度之研究,人口學刊第40期:203-229。 | zh_TW |
dc.relation.reference (參考文獻) | 英文參考文獻 | zh_TW |
dc.relation.reference (參考文獻) | Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. | zh_TW |
dc.relation.reference (參考文獻) | Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266. | zh_TW |
dc.relation.reference (參考文獻) | English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518. | zh_TW |
dc.relation.reference (參考文獻) | Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics: | zh_TW |
dc.relation.reference (參考文獻) | A Festscrift for Terry Speed, 40, pp.197-230. | zh_TW |
dc.relation.reference (參考文獻) | Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report. | zh_TW |
dc.relation.reference (參考文獻) | Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519. | zh_TW |
dc.relation.reference (參考文獻) | Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269. | zh_TW |
dc.relation.reference (參考文獻) | Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York. | zh_TW |
dc.relation.reference (參考文獻) | Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209. | zh_TW |
dc.relation.reference (參考文獻) | Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35. | zh_TW |
dc.relation.reference (參考文獻) | Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report. | zh_TW |
dc.relation.reference (參考文獻) | Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc. | zh_TW |
dc.relation.reference (參考文獻) | Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann. | zh_TW |
dc.relation.reference (參考文獻) | Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57. | zh_TW |
dc.relation.reference (參考文獻) | Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65. | zh_TW |
dc.relation.reference (參考文獻) | 相關網站: | zh_TW |
dc.relation.reference (參考文獻) | 中華民國統計資訊網,URL: http://www.stat.gov.tw | zh_TW |