學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 應用kNN文字探勘技術於分析新聞評論 影響股價漲跌趨勢之研究
The Study of Analyzing Comments of News for Influence of Stock Price Trends Prediction by Using Knn Text Mining
作者 詹智勝
Chan, Chih Sheng
貢獻者 楊建民
Yang, Chien Ming
詹智勝
Chan, Chih Sheng
關鍵詞 網路口碑
股價趨勢預測
文字探勘
kNN
群集分析
Internet Word-of-Mouth
The Stock Trend Prediction
Text Mining
kNN
Cluster Analysis
日期 2013
上傳時間 1-Jul-2014 12:06:17 (UTC+8)
摘要 在網際網路快速發展下,大量使用者在獲取知識與新聞的管道,已由傳統媒體轉移到網路上。網路活動下使用者互動後所留下的訊息,也就是網路口碑,也逐漸受到重視。而隨著經濟發展,國人在固定薪資下無法負擔高房價、高物價的生活,如何透過投資理財來增加自身財富,已是非常普遍,其中又以股市投資為大眾所重視之途徑。

網路新聞的發布,除了具有網路的即時性外,配合使用者閱讀內化後所留下的評論,應含有比網路新聞本身內容更多的資訊,投資者便可藉此找尋隱含之中大量市場消息與資訊。

本研究為了在龐大的資料量中,幫助使用者挖掘其背後之涵義,進而提供投資預測,將蒐集網路新聞及其閱讀者評論共1068篇,並分為訓練資料與測試資料,使用文字探勘及相關技術做前處理,再透過kNN分群技術,計算訓練資料文件間相似度,將大量未知資料依其相似度做分群後,利用歷史股價訊息對群集結果之特徵分析解釋之並建立預測模型,最後透過測試資料將模型分群結果進行評估,進而對股價趨勢做出預測。
With the rapid development of the Internet, the way of user access to knowledge and news transfer from traditional media to the network. Internet word-of-mouth, the message generated from users` interaction on internet, attracts more and more people`s attention. With economic development, people in the fixed salary cannot afford high prices and high price in live. People increase their own wealth through investment is very common, among which the stock market is the way to public attention.

Internet news has the immediacy of the Internet. And the comments left with the user to read the internalization should contain more information than the Internet news. Investors can find the market news and information by Internet news and comments.

In this study, in order to help the user to find the meaning behind the huge amount of data, and thus provide investment forecast. We will collect 1068 of internet news and reader reviews to divide into training data and test data using text mining and related technologies to do the pre-treatment, and then calculate the similarity between the training data by kNN, a lot of unknown data according to their similarity clustering. Cluster through the historical share price analysis and modeling. Finally, the model clustering results were evaluated through the test data to predict price trends. The prediction model from training data clustering, use test data to do the evaluation found: k = 15, the similarity threshold value = 0.05, cluster the results of the F-measure performance up to 56% rise in the cluster. K values and the similarity threshold will be adjusted to obtain the most favorable results of the model
參考文獻 一、 中文部分
1. 喻欣凱,2008,運用支援向量機與文字探勘於股價漲跌趨勢之預測,輔仁大學資訊管理學系碩士論文。
2. 陳均碩,2000,農業電子報使用者動機、行為與滿足程度之研究-以資策會「臺灣農業資訊網(TAIS)電子報」為例,國立臺灣大學農業推廣學研究所碩士論文。
3. 陳應強,2005,影響電子報讀者選擇與閱讀行為之研究,南華大學出版事業管理研究所碩士論文。
4. 鍾任明,2004,運用文字探勘於日內股價漲跌趨勢預測之研究,中原大學資訊管理研究所碩士論文。
5. 陳崇正,2009,應用網路書籤與VSM相似度演算法於強化實踐社群的形成,國立中正大學資訊工程研究所碩士論文。
6. 吳漢瑞,2011,應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究,國立政治大學資訊管理研究所碩士論文。
7. 陳柏均,2011,文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究,國立政治大學資訊管理研究所碩士論文。
8. 費翠,網路市場行家理論驗證與延伸---其網路資訊搜尋、口碑傳播、線上購物行為及個人特質研究,國立政治大學廣告研究所,2001。

二、 英文部分
1. Wuthrich, B., Cho, V., Leung, S., Permunetilleke, D., Sankaran, K., & Zhang, J. (1998, October). Daily stock market forecast from textual web data. InSystems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on(Vol. 3, pp. 2720-2725). IEEE.
2. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000, August). Mining of concurrent text and time series. In KDD-2000 Workshop on Text Mining (pp. 37-44).
3. Gidófalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego.
4. Ahmad, K., Oliveira, P. C. F., Casey, M., & Taskaya, T. (2002). Description of events: an analysis of keywords and indexical names. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002: Workshop on Event Modelling for Multilingual Document Linking (pp. 29-35).
5. Fung, G., Yu, J., & Lam, W. (2002). News sensitive stock trend prediction.Advances in Knowledge Discovery and Data Mining, 481-493.
6. Pui Cheong Fung, G., Xu Yu, J., & Lam, W. (2003, March). Stock prediction: Integrating text mining approach using real-time news. In Computational Intelligence for Financial Engineering, 2003. Proceedings. 2003 IEEE International Conference on (pp. 395-402). IEEE.
7. Mittermayer, M. A. (2004, January). Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on (pp. 10-pp). IEEE.
8. Arndt, J. (1967). Role of product-related conversations in the diffusion of a new product. Journal of marketing Research, 291-295.
9. Westbrook, R. A. (1987). Product/consumption-based affective responses and postpurchase processes. Journal of marketing research, 258-270.
10. Bone, P. F. (1995). Word-of-mouth effects on short-term and long-term product judgments. Journal of Business Research, 32(3), 213-223.
11. Duhan, D. F., Johnson, S. D., Wilcox, J. B., & Harrell, G. D. (1997). Influences on consumer use of word-of-mouth recommendation sources. Journal of the Academy of Marketing Science, 25(4), 283-295.
12. Katz, E., & Lazarsfeld, P. F. (2006). Personal influence: The part played by people in the flow of mass communications. Transaction Pub.
13. Richins, M. L. (1983). Negative word-of-mouth by dissatisfied consumers: a pilot study. The Journal of Marketing, 68-78.
14. Sheth, J. N. (1971). Word-of-mouth in low-risk innovations. Journal of Advertising Research, 11(3), 15-18.
15. Engel, J. F., Kegerreis, R. J., & Blackwell, R. D. (1969). Word-of-mouth communication by the innovator. The Journal of Marketing, 15-19.
16. Rogers, E. M. (1995). Diffusion of innovations. Simon and Schuster.
17. Silverman, G. (1997). Harvesting the power of word of mouth. Potentials in Marketing, 30(9), 14-16.
18. Murray, K. B. (1991). A test of services marketing theory: consumer information acquisition activities. The Journal of Marketing, 10-25.
19. Hennig‐Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word‐of‐mouth via consumer‐opinion platforms: What motivates consumers to articulate themselves on the Internet?. Journal of interactive marketing, 18(1), 38-52.
20. Hanson, W. A. (2000), Principles of Internet Marketing, Ohio: South-Western College Publishing.
21. Granitz, N. A., & Ward, J. C. (1996). Virtual community: A sociocognitive analysis. Advances in Consumer Research, 23, 161-166.
22. Bickart, B., & Schindler, R. M. (2001). Internet forums as influential sources of consumer information. Journal of interactive marketing, 15(3), 31-40.
23. Herr, P. M., Kardes, F. R., & Kim, J. (1991). Effects of word-of-mouth and product-attribute information on persuasion: An accessibility-diagnosticity perspective. Journal of Consumer Research, 454-462.
24. Gelb, B. D., & Sundaram, S. (2002). Adapting to" word of mouse". Business Horizons, 45(4), 21-25.
25. Ridings, C. M., Gefen, D., & Arinze, B. (2002). Some antecedents and effects of trust in virtual communities. The Journal of Strategic Information Systems,11(3), 271-295.
26. Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley & Sons, Inc.
27. Simoudis, E. (1996). Reality check for data mining. IEEE Expert: Intelligent systems and their applications, 11(5), 26-33.
28. Feldman, R., & Dagan, I. (1995, August). Knowledge discovery in textual databases (KDT). In Proc. 1st Int. Conf. Knowledge Discovery and Data Mining(pp. 112-117).
29. Singh, L., Scheuermann, P., & Chen, B. (1997, January). Generating association rules from semi-structured documents using an extended concept hierarchy. In Proceedings of the sixth international conference on Information and knowledge management (pp. 193-200). ACM.
30. Cheung, C. F., Lee, W. B., & Wang, Y. (2005). A multi-facet taxonomy system with applications in unstructured knowledge management. Journal of knowledge management, 9(6), 76-91.
31. Tan, A. H. (1999, April). Text mining: The state of the art and the challenges. InProceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases (pp. 65-70).
32. Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
33. Fan, С. К., & Tsai, W. H. (1988). Automatic word identification in Chinese sentences by the relaxation technique. Computer Processing of Chinese and Oriental Languages.
34. Sproat, R. and Shih, C., (1990), A Statistical Method for Finding Word Boundaries in Chinese Text, Computer Processing of Chinese and Oriental Languages, pp.336-351.
35. Nie, J. Y., Brisebois, M., & Ren, X. (1996, August). On Chinese text retrieval. InProceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225-233). ACM.
36. Salton, G. M ac Gill M. J (1983). Introduction to Modern Information Retrieval.International Student Edition.
37. Yang, Y., & Liu, X. (1999, August). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 42-49). ACM.
38. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.

三、 網路部分
1. MM Days,http://mmdays.com/2007/05/16/knn/,2007/5/16。
2. Pew Research Center,http://www.pewresearch.org/,2010。
描述 碩士
國立政治大學
資訊管理研究所
100356044
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100356044
資料類型 thesis
dc.contributor.advisor 楊建民zh_TW
dc.contributor.advisor Yang, Chien Mingen_US
dc.contributor.author (Authors) 詹智勝zh_TW
dc.contributor.author (Authors) Chan, Chih Shengen_US
dc.creator (作者) 詹智勝zh_TW
dc.creator (作者) Chan, Chih Shengen_US
dc.date (日期) 2013en_US
dc.date.accessioned 1-Jul-2014 12:06:17 (UTC+8)-
dc.date.available 1-Jul-2014 12:06:17 (UTC+8)-
dc.date.issued (上傳時間) 1-Jul-2014 12:06:17 (UTC+8)-
dc.identifier (Other Identifiers) G0100356044en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/67096-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 100356044zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 在網際網路快速發展下,大量使用者在獲取知識與新聞的管道,已由傳統媒體轉移到網路上。網路活動下使用者互動後所留下的訊息,也就是網路口碑,也逐漸受到重視。而隨著經濟發展,國人在固定薪資下無法負擔高房價、高物價的生活,如何透過投資理財來增加自身財富,已是非常普遍,其中又以股市投資為大眾所重視之途徑。

網路新聞的發布,除了具有網路的即時性外,配合使用者閱讀內化後所留下的評論,應含有比網路新聞本身內容更多的資訊,投資者便可藉此找尋隱含之中大量市場消息與資訊。

本研究為了在龐大的資料量中,幫助使用者挖掘其背後之涵義,進而提供投資預測,將蒐集網路新聞及其閱讀者評論共1068篇,並分為訓練資料與測試資料,使用文字探勘及相關技術做前處理,再透過kNN分群技術,計算訓練資料文件間相似度,將大量未知資料依其相似度做分群後,利用歷史股價訊息對群集結果之特徵分析解釋之並建立預測模型,最後透過測試資料將模型分群結果進行評估,進而對股價趨勢做出預測。
zh_TW
dc.description.abstract (摘要) With the rapid development of the Internet, the way of user access to knowledge and news transfer from traditional media to the network. Internet word-of-mouth, the message generated from users` interaction on internet, attracts more and more people`s attention. With economic development, people in the fixed salary cannot afford high prices and high price in live. People increase their own wealth through investment is very common, among which the stock market is the way to public attention.

Internet news has the immediacy of the Internet. And the comments left with the user to read the internalization should contain more information than the Internet news. Investors can find the market news and information by Internet news and comments.

In this study, in order to help the user to find the meaning behind the huge amount of data, and thus provide investment forecast. We will collect 1068 of internet news and reader reviews to divide into training data and test data using text mining and related technologies to do the pre-treatment, and then calculate the similarity between the training data by kNN, a lot of unknown data according to their similarity clustering. Cluster through the historical share price analysis and modeling. Finally, the model clustering results were evaluated through the test data to predict price trends. The prediction model from training data clustering, use test data to do the evaluation found: k = 15, the similarity threshold value = 0.05, cluster the results of the F-measure performance up to 56% rise in the cluster. K values and the similarity threshold will be adjusted to obtain the most favorable results of the model
en_US
dc.description.tableofcontents 第一章、緒論 1
第一節、 研究背景與動機 1
第二節、 研究目的 2
第三節、 研究步驟與流程 2
第二章、文獻探討 4
第一節、 運用新聞資料於預測與口碑 4
2.1.1. 新聞資料(消息面)於股價預測之相關研究 4
2.1.2. 何謂口碑 7
2.1.3. 為何口碑會有如此大的效力? 8
2.1.4. 何謂網路口碑 10
2.1.5. 傳統口碑與網路口碑的差異 10
第二節、 文字探勘與其相關技術 12
2.2.1. 文字探勘的定義 12
2.2.2. 文字探勘的架構 13
2.2.3. 中文斷詞 14
2.2.4. 中央研究院CKIP斷詞系統 15
2.2.5. 文件特徵值選取 15
2.2.6. 向量空間模型的運用 17
2.2.7. 文件相似度計算 19
第三節、 群集分析 19
2.3.1. k-最鄰近演算法(k-Nearest Neighbor ,kNN) 20
2.3.2. 分群績效評估 21
第四節、 文獻探討小結 22
第三章、研究方法與設計 23
第一節、 研究架構 23
第二節、 資料來源與處理 25
3.2.1. 資料蒐集 25
3.2.2. 資料處理模組 28
第三節、 分群分析 30
3.3.1. 文件相似度計算 30
3.3.2. kNN分群 30
第四節、 分群分類績效評估 32
3.4.1. 分析模組 32
第五節、 研究流程與預期結果 33
3.5.1. 研究流程 33
3.5.2. 預期結果 33
第四章、研究結果 34
第一節、 預測模型之建立 34
第二節、 預測模型之結果1 37
第三節、 預測模型之累積報酬率 40
第四節、 預測模型之結果2 42
第五節、 預測模型之結果3 44
第五章、結論與未來研究方向 46
第一節、 結論與建議 46
第二節、 未來研究方向 46
參考文獻 48
zh_TW
dc.format.extent 558014 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100356044en_US
dc.subject (關鍵詞) 網路口碑zh_TW
dc.subject (關鍵詞) 股價趨勢預測zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) kNNzh_TW
dc.subject (關鍵詞) 群集分析zh_TW
dc.subject (關鍵詞) Internet Word-of-Mouthen_US
dc.subject (關鍵詞) The Stock Trend Predictionen_US
dc.subject (關鍵詞) Text Miningen_US
dc.subject (關鍵詞) kNNen_US
dc.subject (關鍵詞) Cluster Analysisen_US
dc.title (題名) 應用kNN文字探勘技術於分析新聞評論 影響股價漲跌趨勢之研究zh_TW
dc.title (題名) The Study of Analyzing Comments of News for Influence of Stock Price Trends Prediction by Using Knn Text Miningen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 一、 中文部分
1. 喻欣凱,2008,運用支援向量機與文字探勘於股價漲跌趨勢之預測,輔仁大學資訊管理學系碩士論文。
2. 陳均碩,2000,農業電子報使用者動機、行為與滿足程度之研究-以資策會「臺灣農業資訊網(TAIS)電子報」為例,國立臺灣大學農業推廣學研究所碩士論文。
3. 陳應強,2005,影響電子報讀者選擇與閱讀行為之研究,南華大學出版事業管理研究所碩士論文。
4. 鍾任明,2004,運用文字探勘於日內股價漲跌趨勢預測之研究,中原大學資訊管理研究所碩士論文。
5. 陳崇正,2009,應用網路書籤與VSM相似度演算法於強化實踐社群的形成,國立中正大學資訊工程研究所碩士論文。
6. 吳漢瑞,2011,應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究,國立政治大學資訊管理研究所碩士論文。
7. 陳柏均,2011,文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究,國立政治大學資訊管理研究所碩士論文。
8. 費翠,網路市場行家理論驗證與延伸---其網路資訊搜尋、口碑傳播、線上購物行為及個人特質研究,國立政治大學廣告研究所,2001。

二、 英文部分
1. Wuthrich, B., Cho, V., Leung, S., Permunetilleke, D., Sankaran, K., & Zhang, J. (1998, October). Daily stock market forecast from textual web data. InSystems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on(Vol. 3, pp. 2720-2725). IEEE.
2. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000, August). Mining of concurrent text and time series. In KDD-2000 Workshop on Text Mining (pp. 37-44).
3. Gidófalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego.
4. Ahmad, K., Oliveira, P. C. F., Casey, M., & Taskaya, T. (2002). Description of events: an analysis of keywords and indexical names. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002: Workshop on Event Modelling for Multilingual Document Linking (pp. 29-35).
5. Fung, G., Yu, J., & Lam, W. (2002). News sensitive stock trend prediction.Advances in Knowledge Discovery and Data Mining, 481-493.
6. Pui Cheong Fung, G., Xu Yu, J., & Lam, W. (2003, March). Stock prediction: Integrating text mining approach using real-time news. In Computational Intelligence for Financial Engineering, 2003. Proceedings. 2003 IEEE International Conference on (pp. 395-402). IEEE.
7. Mittermayer, M. A. (2004, January). Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on (pp. 10-pp). IEEE.
8. Arndt, J. (1967). Role of product-related conversations in the diffusion of a new product. Journal of marketing Research, 291-295.
9. Westbrook, R. A. (1987). Product/consumption-based affective responses and postpurchase processes. Journal of marketing research, 258-270.
10. Bone, P. F. (1995). Word-of-mouth effects on short-term and long-term product judgments. Journal of Business Research, 32(3), 213-223.
11. Duhan, D. F., Johnson, S. D., Wilcox, J. B., & Harrell, G. D. (1997). Influences on consumer use of word-of-mouth recommendation sources. Journal of the Academy of Marketing Science, 25(4), 283-295.
12. Katz, E., & Lazarsfeld, P. F. (2006). Personal influence: The part played by people in the flow of mass communications. Transaction Pub.
13. Richins, M. L. (1983). Negative word-of-mouth by dissatisfied consumers: a pilot study. The Journal of Marketing, 68-78.
14. Sheth, J. N. (1971). Word-of-mouth in low-risk innovations. Journal of Advertising Research, 11(3), 15-18.
15. Engel, J. F., Kegerreis, R. J., & Blackwell, R. D. (1969). Word-of-mouth communication by the innovator. The Journal of Marketing, 15-19.
16. Rogers, E. M. (1995). Diffusion of innovations. Simon and Schuster.
17. Silverman, G. (1997). Harvesting the power of word of mouth. Potentials in Marketing, 30(9), 14-16.
18. Murray, K. B. (1991). A test of services marketing theory: consumer information acquisition activities. The Journal of Marketing, 10-25.
19. Hennig‐Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word‐of‐mouth via consumer‐opinion platforms: What motivates consumers to articulate themselves on the Internet?. Journal of interactive marketing, 18(1), 38-52.
20. Hanson, W. A. (2000), Principles of Internet Marketing, Ohio: South-Western College Publishing.
21. Granitz, N. A., & Ward, J. C. (1996). Virtual community: A sociocognitive analysis. Advances in Consumer Research, 23, 161-166.
22. Bickart, B., & Schindler, R. M. (2001). Internet forums as influential sources of consumer information. Journal of interactive marketing, 15(3), 31-40.
23. Herr, P. M., Kardes, F. R., & Kim, J. (1991). Effects of word-of-mouth and product-attribute information on persuasion: An accessibility-diagnosticity perspective. Journal of Consumer Research, 454-462.
24. Gelb, B. D., & Sundaram, S. (2002). Adapting to" word of mouse". Business Horizons, 45(4), 21-25.
25. Ridings, C. M., Gefen, D., & Arinze, B. (2002). Some antecedents and effects of trust in virtual communities. The Journal of Strategic Information Systems,11(3), 271-295.
26. Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley & Sons, Inc.
27. Simoudis, E. (1996). Reality check for data mining. IEEE Expert: Intelligent systems and their applications, 11(5), 26-33.
28. Feldman, R., & Dagan, I. (1995, August). Knowledge discovery in textual databases (KDT). In Proc. 1st Int. Conf. Knowledge Discovery and Data Mining(pp. 112-117).
29. Singh, L., Scheuermann, P., & Chen, B. (1997, January). Generating association rules from semi-structured documents using an extended concept hierarchy. In Proceedings of the sixth international conference on Information and knowledge management (pp. 193-200). ACM.
30. Cheung, C. F., Lee, W. B., & Wang, Y. (2005). A multi-facet taxonomy system with applications in unstructured knowledge management. Journal of knowledge management, 9(6), 76-91.
31. Tan, A. H. (1999, April). Text mining: The state of the art and the challenges. InProceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases (pp. 65-70).
32. Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
33. Fan, С. К., & Tsai, W. H. (1988). Automatic word identification in Chinese sentences by the relaxation technique. Computer Processing of Chinese and Oriental Languages.
34. Sproat, R. and Shih, C., (1990), A Statistical Method for Finding Word Boundaries in Chinese Text, Computer Processing of Chinese and Oriental Languages, pp.336-351.
35. Nie, J. Y., Brisebois, M., & Ren, X. (1996, August). On Chinese text retrieval. InProceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225-233). ACM.
36. Salton, G. M ac Gill M. J (1983). Introduction to Modern Information Retrieval.International Student Edition.
37. Yang, Y., & Liu, X. (1999, August). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 42-49). ACM.
38. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.

三、 網路部分
1. MM Days,http://mmdays.com/2007/05/16/knn/,2007/5/16。
2. Pew Research Center,http://www.pewresearch.org/,2010。
zh_TW