學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 勞工職位特質分析-多元尺度法於大資料分析之應用
The occupational characteristics analysis -the application of large data multidimensional scaling method
作者 陳烽威
Chen, Fong Wei
貢獻者 曾正男
Tzeng, Jengnan
陳烽威
Chen, Fong Wei
關鍵詞 多元尺度法
勞工
職位特質
multidimensional scaling
labor
occupational characteristic
日期 2012
上傳時間 30-Oct-2012 16:37:35 (UTC+8)
摘要 本文自美國人口普查局 (United States Census Bureau) 取得多達十萬筆的勞工資料,然而在如此大量的勞工資料中因維度的詛咒,所以我們無法使用傳統的資料探勘的方法分析資料,而且傳統的序述統計也無法提供一個好的分析方向,因此我們藉由 Tzeng et al (2008) 所提出的分解與結合多元尺度法 (Split-and-combine Multidimensional Scaling, SC-MDS) 為分析方法來剖析此資料。多元尺度法主要的目的有二:第一,使資料展現在空間中,並以資料點與點之間的距離表示其相關性;第二,降低資料維度避免維度的詛咒。SC-MDS 提供我們在分析此大資料相關聯性時的優先順序為年齡、學歷、性別;並結合職位資訊聯合資料庫 (Occupational Information Network)分析在此架構下不同分類的勞工在其就業的職位特質上的差異。我們發現了教育程度會影響性別間在勞工職位特質上的差異,且這些差異的數量又會隨年齡的增加而增加;教育程度在各個年齡層都對勞工職位特質產生很大的差異;最後,青年與壯年的勞工在職位特質上相較於壯年與中年勞工相似,並對以上產生相似或差異的原因提出解釋。
A big labor data from United States Census Bureau will occur two problems. First, since the big data issue, we can not use the traditional method of data mining. Second, the descriptive statistics can not offer an explicit analysis, so we use Split-and-combine Multidimensional Scaling (SC-MDS), which is proposed by Tzeng et al (2008) to mining this labor data. MDS has two main purposes: First, Express data similarity by the distance between each pair points in spatial configuration. Second, Reducing data dimension to aviod the curse of dimension. After SC-MDS, the big labor data can be analysed by age, education and sex. We combine this order and the Occupational Information Network data base to develope the differences in occupational characteristics. We find the following phenomenon: first, differences are increasing with ages. Second, eduction do impact labors` characteristics in every ages. Third, the youth labors are more similar in occupational characteristics than olders. Finally, we try to explain the results above.
參考文獻 Bellman, Richard Ernest (1957), Dynamic Programming, Princeton : Princeton University Press.

Chalmers, M. (1996), A linear iteration time layout algorithm for visualising high { dimensional data", IEEE Visualization, 127-132.

Cox, Trevor F. and Cox, Michael A. A. (2001), Multidimensional scaling, London : Chapman & Hall, 2 edition.

Dasgupta, S. and Gupta, A. (1999), An elementary proof of the johnsonlindenstrauss lemma", Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA.

Dempster, Arthur, Laird, Nan, and Rubin, Donald (1977), Maximum likelihood from incomplete data via the em algorithm", Journal of the Royal Statistical Society, Series B, 39(1), 1-38.

Dwyer, Tim and Gallagher, David R. (2004), Visualising changes in fund manager holdings in two and a half dimensions.", Information Visualization, 3, 227-258.

Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. (1992), Knowledge discovery in databases: An overview", AI Magazine, 213-228.

Groenen, Patrick J.F. and Franses, Philip Hans (2000), Visualizing timevarying correlations across stock markets.", Journal of Empirical Finance, 7, 155-172.

Johnson, W.B. and Lindenstrauss, J. (1984), Extensions of lipshitz mapping into hilbert. space", volume 26, 189-206, In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, Amer. Math. Soc.

Knuth, Donald E. (1973), The art of computer programming, Boston, Mass.: Addison-Wesley.

Kruskal, J.B (1964), Nonmetric multidimensional scaling: a numerical method.", Psychometrika, 29, 115-129.

Lloyd, S. P. (1957), Least square quantization in pcm", Bell Telephone Laboratories Paper.

Lloyd, S.P. (1982), Least squares quantization in pcm", IEEE Transactions on Information Theory, 28(2), 129-137.

Morrison, Alistair, Ross, Greg, and Chalmers, Matthew (2003), Discussion of a set of points in terms of their mutual distances.", Information Visualization, 2, 68-77.

Pearson, K. (1901), On lines and planes of closest t to systems of points in space.", Philosophical Magazine, 2(6), 559-572.

Shepard, R.N. (1962), The analysis of proximities: Multidimensional scaling with an unknown distance function.", Psychometrika, 27(2), 125-140.

Torgerson, Warren (1952), Multidimensional scaling: I. theory and method", Psychometrika, 17, 401-419.

Tzeng, Jengnan, Lu, Henry Horng-Shing, and Wen-Hsiung, Li (2008), Multidimensional scaling for large genomic data sets.", BMC Bioinformatics, 9, 1 - 17.

White, Tom. (2009), Hadoop: The De nitive Guide, O`Reilly Media, 1 edition.

Young, G.W and Householder, A.S (2003), Fast multidimensional scaling through sampling, springs and interpolation.", Information Visualization, 2, 68-77.
描述 碩士
國立政治大學
經濟學系
99258014
101
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0099258014
資料類型 thesis
dc.contributor.advisor 曾正男zh_TW
dc.contributor.advisor Tzeng, Jengnanen_US
dc.contributor.author (Authors) 陳烽威zh_TW
dc.contributor.author (Authors) Chen, Fong Weien_US
dc.creator (作者) 陳烽威zh_TW
dc.creator (作者) Chen, Fong Weien_US
dc.date (日期) 2012en_US
dc.date.accessioned 30-Oct-2012 16:37:35 (UTC+8)-
dc.date.available 30-Oct-2012 16:37:35 (UTC+8)-
dc.date.issued (上傳時間) 30-Oct-2012 16:37:35 (UTC+8)-
dc.identifier (Other Identifiers) G0099258014en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/55096-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 經濟學系zh_TW
dc.description (描述) 99258014zh_TW
dc.description (描述) 101zh_TW
dc.description.abstract (摘要) 本文自美國人口普查局 (United States Census Bureau) 取得多達十萬筆的勞工資料,然而在如此大量的勞工資料中因維度的詛咒,所以我們無法使用傳統的資料探勘的方法分析資料,而且傳統的序述統計也無法提供一個好的分析方向,因此我們藉由 Tzeng et al (2008) 所提出的分解與結合多元尺度法 (Split-and-combine Multidimensional Scaling, SC-MDS) 為分析方法來剖析此資料。多元尺度法主要的目的有二:第一,使資料展現在空間中,並以資料點與點之間的距離表示其相關性;第二,降低資料維度避免維度的詛咒。SC-MDS 提供我們在分析此大資料相關聯性時的優先順序為年齡、學歷、性別;並結合職位資訊聯合資料庫 (Occupational Information Network)分析在此架構下不同分類的勞工在其就業的職位特質上的差異。我們發現了教育程度會影響性別間在勞工職位特質上的差異,且這些差異的數量又會隨年齡的增加而增加;教育程度在各個年齡層都對勞工職位特質產生很大的差異;最後,青年與壯年的勞工在職位特質上相較於壯年與中年勞工相似,並對以上產生相似或差異的原因提出解釋。zh_TW
dc.description.abstract (摘要) A big labor data from United States Census Bureau will occur two problems. First, since the big data issue, we can not use the traditional method of data mining. Second, the descriptive statistics can not offer an explicit analysis, so we use Split-and-combine Multidimensional Scaling (SC-MDS), which is proposed by Tzeng et al (2008) to mining this labor data. MDS has two main purposes: First, Express data similarity by the distance between each pair points in spatial configuration. Second, Reducing data dimension to aviod the curse of dimension. After SC-MDS, the big labor data can be analysed by age, education and sex. We combine this order and the Occupational Information Network data base to develope the differences in occupational characteristics. We find the following phenomenon: first, differences are increasing with ages. Second, eduction do impact labors` characteristics in every ages. Third, the youth labors are more similar in occupational characteristics than olders. Finally, we try to explain the results above.en_US
dc.description.tableofcontents 1 緒論1
1.1 研究動機. .1
1.2 文獻回顧. .1
1.3 文章架構. .4
2 大資料的多元尺度法與最鄰近搜索分群法. .6
2.1 維度的詛咒. .6
2.1.1 多元尺度法的意義. .7
2.1.2 多元尺度法的理論架構. .8
2.2 最鄰近搜索法. .15
3 美國當期人口調查的多元尺度分析. .16
3.1 資料收集與整理. .16
3.2 SC-MDS 的三維視圖. .16
4 結合職位特質資料. .20
4.1 資料的收集與整理. .20
4.2 職位等級資料SC-MDS 的三維視圖. .21
4.3 以SC-MDS 樹狀圖分析勞工特質差異. .23
5 結論. .30
參考文獻. .31
附錄. .34
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0099258014en_US
dc.subject (關鍵詞) 多元尺度法zh_TW
dc.subject (關鍵詞) 勞工zh_TW
dc.subject (關鍵詞) 職位特質zh_TW
dc.subject (關鍵詞) multidimensional scalingen_US
dc.subject (關鍵詞) laboren_US
dc.subject (關鍵詞) occupational characteristicen_US
dc.title (題名) 勞工職位特質分析-多元尺度法於大資料分析之應用zh_TW
dc.title (題名) The occupational characteristics analysis -the application of large data multidimensional scaling methoden_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) Bellman, Richard Ernest (1957), Dynamic Programming, Princeton : Princeton University Press.

Chalmers, M. (1996), A linear iteration time layout algorithm for visualising high { dimensional data", IEEE Visualization, 127-132.

Cox, Trevor F. and Cox, Michael A. A. (2001), Multidimensional scaling, London : Chapman & Hall, 2 edition.

Dasgupta, S. and Gupta, A. (1999), An elementary proof of the johnsonlindenstrauss lemma", Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA.

Dempster, Arthur, Laird, Nan, and Rubin, Donald (1977), Maximum likelihood from incomplete data via the em algorithm", Journal of the Royal Statistical Society, Series B, 39(1), 1-38.

Dwyer, Tim and Gallagher, David R. (2004), Visualising changes in fund manager holdings in two and a half dimensions.", Information Visualization, 3, 227-258.

Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. (1992), Knowledge discovery in databases: An overview", AI Magazine, 213-228.

Groenen, Patrick J.F. and Franses, Philip Hans (2000), Visualizing timevarying correlations across stock markets.", Journal of Empirical Finance, 7, 155-172.

Johnson, W.B. and Lindenstrauss, J. (1984), Extensions of lipshitz mapping into hilbert. space", volume 26, 189-206, In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, Amer. Math. Soc.

Knuth, Donald E. (1973), The art of computer programming, Boston, Mass.: Addison-Wesley.

Kruskal, J.B (1964), Nonmetric multidimensional scaling: a numerical method.", Psychometrika, 29, 115-129.

Lloyd, S. P. (1957), Least square quantization in pcm", Bell Telephone Laboratories Paper.

Lloyd, S.P. (1982), Least squares quantization in pcm", IEEE Transactions on Information Theory, 28(2), 129-137.

Morrison, Alistair, Ross, Greg, and Chalmers, Matthew (2003), Discussion of a set of points in terms of their mutual distances.", Information Visualization, 2, 68-77.

Pearson, K. (1901), On lines and planes of closest t to systems of points in space.", Philosophical Magazine, 2(6), 559-572.

Shepard, R.N. (1962), The analysis of proximities: Multidimensional scaling with an unknown distance function.", Psychometrika, 27(2), 125-140.

Torgerson, Warren (1952), Multidimensional scaling: I. theory and method", Psychometrika, 17, 401-419.

Tzeng, Jengnan, Lu, Henry Horng-Shing, and Wen-Hsiung, Li (2008), Multidimensional scaling for large genomic data sets.", BMC Bioinformatics, 9, 1 - 17.

White, Tom. (2009), Hadoop: The De nitive Guide, O`Reilly Media, 1 edition.

Young, G.W and Householder, A.S (2003), Fast multidimensional scaling through sampling, springs and interpolation.", Information Visualization, 2, 68-77.
zh_TW