Please use this identifier to cite or link to this item:

Title: 勞工職位特質分析-多元尺度法於大資料分析之應用
The occupational characteristics analysis -the application of large data multidimensional scaling method
Authors: 陳烽威
Chen, Fong Wei
Contributors: 曾正男
Tzeng, Jengnan
Chen, Fong Wei
Keywords: 多元尺度法
multidimensional scaling
occupational characteristic
Date: 2012
Issue Date: 2012-10-30 16:37:35 (UTC+8)
Abstract: 本文自美國人口普查局 (United States Census Bureau) 取得多達十萬筆的勞工資料,然而在如此大量的勞工資料中因維度的詛咒,所以我們無法使用傳統的資料探勘的方法分析資料,而且傳統的序述統計也無法提供一個好的分析方向,因此我們藉由 Tzeng et al (2008) 所提出的分解與結合多元尺度法 (Split-and-combine Multidimensional Scaling, SC-MDS) 為分析方法來剖析此資料。多元尺度法主要的目的有二:第一,使資料展現在空間中,並以資料點與點之間的距離表示其相關性;第二,降低資料維度避免維度的詛咒。SC-MDS 提供我們在分析此大資料相關聯性時的優先順序為年齡、學歷、性別;並結合職位資訊聯合資料庫 (Occupational Information Network)分析在此架構下不同分類的勞工在其就業的職位特質上的差異。我們發現了教育程度會影響性別間在勞工職位特質上的差異,且這些差異的數量又會隨年齡的增加而增加;教育程度在各個年齡層都對勞工職位特質產生很大的差異;最後,青年與壯年的勞工在職位特質上相較於壯年與中年勞工相似,並對以上產生相似或差異的原因提出解釋。
A big labor data from United States Census Bureau will occur two problems. First, since the big data issue, we can not use the traditional method of data mining. Second, the descriptive statistics can not offer an explicit analysis, so we use Split-and-combine Multidimensional Scaling (SC-MDS), which is proposed by Tzeng et al (2008) to mining this labor data. MDS has two main purposes: First, Express data similarity by the distance between each pair points in spatial configuration. Second, Reducing data dimension to aviod the curse of dimension. After SC-MDS, the big labor data can be analysed by age, education and sex. We combine this order and the Occupational Information Network data base to develope the differences in occupational characteristics. We find the following phenomenon: first, differences are increasing with ages. Second, eduction do impact labors' characteristics in every ages. Third, the youth labors are more similar in occupational characteristics than olders. Finally, we try to explain the results above.
Reference: Bellman, Richard Ernest (1957), Dynamic Programming, Princeton : Princeton University Press.

Chalmers, M. (1996), A linear iteration time layout algorithm for visualising high { dimensional data", IEEE Visualization, 127-132.

Cox, Trevor F. and Cox, Michael A. A. (2001), Multidimensional scaling, London : Chapman & Hall, 2 edition.

Dasgupta, S. and Gupta, A. (1999), An elementary proof of the johnsonlindenstrauss lemma", Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA.

Dempster, Arthur, Laird, Nan, and Rubin, Donald (1977), Maximum likelihood from incomplete data via the em algorithm", Journal of the Royal Statistical Society, Series B, 39(1), 1-38.

Dwyer, Tim and Gallagher, David R. (2004), Visualising changes in fund manager holdings in two and a half dimensions.", Information Visualization, 3, 227-258.

Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. (1992), Knowledge discovery in databases: An overview", AI Magazine, 213-228.

Groenen, Patrick J.F. and Franses, Philip Hans (2000), Visualizing timevarying correlations across stock markets.", Journal of Empirical Finance, 7, 155-172.

Johnson, W.B. and Lindenstrauss, J. (1984), Extensions of lipshitz mapping into hilbert. space", volume 26, 189-206, In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, Amer. Math. Soc.

Knuth, Donald E. (1973), The art of computer programming, Boston, Mass.: Addison-Wesley.

Kruskal, J.B (1964), Nonmetric multidimensional scaling: a numerical method.", Psychometrika, 29, 115-129.

Lloyd, S. P. (1957), Least square quantization in pcm", Bell Telephone Laboratories Paper.

Lloyd, S.P. (1982), Least squares quantization in pcm", IEEE Transactions on Information Theory, 28(2), 129-137.

Morrison, Alistair, Ross, Greg, and Chalmers, Matthew (2003), Discussion of a set of points in terms of their mutual distances.", Information Visualization, 2, 68-77.

Pearson, K. (1901), On lines and planes of closest t to systems of points in space.", Philosophical Magazine, 2(6), 559-572.

Shepard, R.N. (1962), The analysis of proximities: Multidimensional scaling with an unknown distance function.", Psychometrika, 27(2), 125-140.

Torgerson, Warren (1952), Multidimensional scaling: I. theory and method", Psychometrika, 17, 401-419.

Tzeng, Jengnan, Lu, Henry Horng-Shing, and Wen-Hsiung, Li (2008), Multidimensional scaling for large genomic data sets.", BMC Bioinformatics, 9, 1 - 17.

White, Tom. (2009), Hadoop: The De nitive Guide, O'Reilly Media, 1 edition.

Young, G.W and Householder, A.S (2003), Fast multidimensional scaling through sampling, springs and interpolation.", Information Visualization, 2, 68-77.
Description: 碩士
Source URI:
Data Type: thesis
Appears in Collections:[經濟學系] 學位論文

Files in This Item:

File SizeFormat
801401.pdf1794KbAdobe PDF2744View/Open

All items in 學術集成 are protected by copyright, with all rights reserved.

社群 sharing