DSpace Collection:
https://ah.lib.nccu.edu.tw/handle/140.119/2160
2024-03-29T07:30:22Z在臺灣新聞資料下透過貪婪演算法預測股票報酬
https://ah.lib.nccu.edu.tw/handle/140.119/146908
題名: 在臺灣新聞資料下透過貪婪演算法預測股票報酬; Predicting Stock Returns via Greedy Algorithm with Taiwanese News Data
Authors: 程長磊; Cheng, Chang-Lei
摘要: 隨著大數據、自然語言處理等領域發展,使得非結構化資料(Unstructured Data)具有極大的學術研究價值,尤其是文本資料。許多研究著手文字訊息對資產報酬之影響,使其成為財務領域中重要的研究目標之一,然而文本資料屬於高維度資料,如何正確分析文本資料與報酬間的關係成為此類研究的重要議題。而新聞文章是投資人在交易時最普遍接觸的文本資料,新聞文章與財報資料不同的地方在於新聞文章並沒有實際量化資料做為投資的依據,因此本研究欲透過Ing and Lai (2011)提出之 Orthogonal Greedy Algorithm (OGA) 以及由Chen, Dai, Ing, Lai (2019) 所改良之Chebyshev Greedy Algorithm (CGA) 高維度選模模型,挑選新聞中常用字詞之文字探勘方法以量化新聞文章之情緒分數,並在排除公司報酬因子下計算新聞情緒因子與公司報酬間之關係,並比較當應變數報酬為線性或是非線性的假設之下,利用新聞情緒分數所建構之投資組合之報酬差異。在應變數報酬為連續變數之線性假設下使用 OGA 並推廣為 OGA Predict模型,而在應變數報酬為非線性假設下則使用CGA並推廣為CGA Predict模型,並將上述兩種選模方法創新應用於財務文本分析之中。我們發現相較於OGA Predict,CGA predict模型可以得到更好的超額報酬,同時透過績效評估發現,新聞文章情緒對於散戶投資人為主的臺灣市場之影響與法人投資人為主的美國市場相比是顯著不同的,其結果也符合我們對於臺灣股票市場的經濟直觀。; The development of unstructured data grows fast and has the value of research along with the improvement of the realm of big data, especially for textual data. However, textual data are high dimensional data (i.e. the number of text in the news articles far exceeded than the news articles themselves.), therefore analyzing the relationship between textual data and the average return correctly has been an important issue according to this realm of research. When trading, the textual data that are most commonly received by investors are news articles. The difference between news articles and financial statements is that news articles can not provide quantitative information as an investment foundation. Therefore, we suppose to use two different kinds of high dimensional model selection methods, Orthogonal Greedy Algorithm(Ing and Lai (2011)) and Chebyshev Greedy Algorithm(Chen, Dai, Ing, Lai(2019)), and then select the frequently use words from news articles in order to quantify the sentiment scores of news articles. Moreover, we compare the difference of the portfolio returns which are constructed under two different assumptions(linear or nonlinear) of dependent variables according to the news sentiments. We use the OGA predict model to construct news sentiment when the dependent variable is under linear assumption, otherwise, we use the CGA predict. We find that the average return from the CGA predict model is better than the average return from the OGA predict model. Moreover, there is a significant difference in decision making when trading between the Taiwanese market and US market.
描述: 碩士; 國立政治大學; 統計學系; 1103540302023-09-01T06:58:16ZP2P借貸中借款人特徵與貸款表現關係之實證研究:以Lending Club和機器學習方法為例
https://ah.lib.nccu.edu.tw/handle/140.119/146907
題名: P2P借貸中借款人特徵與貸款表現關係之實證研究:以Lending Club和機器學習方法為例; An Empirical Study of the Relationship between Borrower Characteristics and Loan Performance in Peer-to-Peer Lending: Evidence from Lending Club and Machine Learning Techniques
Authors: 陳槐廷; Chen, Huai-Ting
摘要: 本研究採用 P2P 平台的資料,相較於過往的文獻僅討論小型企業\n貸款,本研究將全面探討各種貸款目的下的貸款表現,並從借款者和\n投資者兩種不同角度進行分析,這包括債務整合、小型企業貸款以及\n信用卡等貸款類型,最後也透過機器學習的方法建構違約及貸款率\n模型。P2P 借貸平台中的借款者和投資者方面的重要變數包括借款金\n額、工作年限、年收入、債務收入比、循環信貸餘額等,透過提高借\n款人的信用特徵和降低投資者的風險意識,可以促進借款人的貸款\n通過率,並增加投資者對借款人的信任程度,在特定貸款目的(如教\n育、婚禮等)下,借款金額可能較低,因為這些目的不具備賺錢的能\n力,可能會增加投資者的風險意識,最後在預測貸款率及違約狀態模\n型中,XGBoost 表現最佳。; This study utilizes data from a P2P platform. In comparison to previous literature\nthat solely focused on small business loans, this research comprehensively discusses\nthe loan performance across various loan purposes, exploring them from both borrower and investor perspectives. This includes different types of loans such as debt\nconsolidation, small business loans, credit card loans, and more. Additionally, machine learning methods are employed to construct Loan Status and Funded Ratio\nmodels. Key variables from the borrower and investor aspects in the P2P lending\nplatform include loan amount, years of employment, annual income, debt-to-income\nratio, revolving credit balance, among others.By enhancing the credit characteristics\nof borrowers and reducing investors’ risk perceptions, it is possible to promote higher\nloan approval rates for borrowers and increase investors’ trust in borrowers. For specific loan purposes, such as education or weddings, the loan amounts may be lower\nas these purposes may not have revenue-generating potential, which could raise investors’ risk awareness.Finally, in predicting loan rates and default status models,\nXGBoost outperformed other methods.
描述: 碩士; 國立政治大學; 統計學系; 1103540292023-09-01T06:57:57Z應用象徵性資料分析法於電影推薦系統之研究
https://ah.lib.nccu.edu.tw/handle/140.119/146906
題名: 應用象徵性資料分析法於電影推薦系統之研究; The application of symbolic data analysis to movie recommendation systems
Authors: 張順益; CHANG, SHUN-YI
摘要: 推薦系統(Recommendation System)如今已廣泛應用於商業行銷,涵蓋範疇包括電影、音樂、新聞、書籍、餐廳、3C 商品以及金融服務等產品的推薦。推薦系統能為用戶提供精確的個性化推薦,從而提高商家的營利。協同過濾算法(collaborative filtering)\\citep{Resnick} 是推薦算法中最常見的一種,其根據用戶對商品的評分進行協同過濾,以便找出合適的產品進行推薦。該演算法的理論基礎在於消費行為相近的用戶應該會偏好類似的商品。然而,協同過濾算法面臨新用戶冷啟動(亦稱新商品問題)和稀疏矩陣等問題。在本研究中,我們針對電影推薦系統,根據用戶群的特徵將其對電影的評分依照電影類型轉換成多值模態象徵性資料(multi-valued modal symbolic data)。此轉換方法考慮到每部電影可能具有多種類型的特點,旨在克服新用戶冷啟動問題並減少缺失值導致的稀疏矩陣問題。我們進行了模擬實驗並分析了實際的電影評分資料,以驗證我們提出的新方法。結果顯示,應用象徵性資料分析法不僅可以提升推薦的效果,更為推薦系統的發展開創了一條新的思考途徑和方法。; Recommendation systems are now widely used in business marketing, spanning various domains such as movies, music, news, books, restaurants, 3C products, and financial services. Collaborative filtering, the most common recommendation algorithm, utilizes user ratings on products to perform collaborative filtering and identify suitable items for recommendations. The theoretical basis of this algorithm is that users with similar consumption behaviors are likely to prefer similar items. However, collaborative filtering algorithms face challenges such as the cold start problem for new users (also known as the new item problem) and the sparsity issue in matrices. In this study, we focus on a movie recommendation system and transform user ratings for movies into multi-valued modal symbolic data based on user group characteristics. This transformation method takes into account the multiple genres or characteristics that a movie may have, aiming to overcome the cold start problem for new users and reduce the sparsity issue caused by missing values in the matrix. We conducted simulation experiments and analyzed real movie rating data to validate the proposed approach. The results showed that the symbolic data analysis method not only improves recommendation effectiveness but also provides a new approach and method for the development of recommendation systems.
描述: 碩士; 國立政治大學; 統計學系; 1103540262023-09-01T06:57:45ZLee-Carter模型於小區域人口的探討
https://ah.lib.nccu.edu.tw/handle/140.119/146905
題名: Lee-Carter模型於小區域人口的探討; A Study of Lee-Carter Model in Small Areas
Authors: 張君瑋; Jhang, Jyun-Wei
摘要: 臺灣居民的平均壽命逐年提升,並未隨時間而有明顯減緩的趨勢,預期我國高齡人口將大幅增加,推估未來壽命成為各界關注的議題。死亡模型常用於壽命推估,過去研究發現Lee-Carter模型用於全國等人數較多的層級時,估計結果相對穩定、準確度也相當不錯,因此Lee-Carter模型廣為各國歡迎。然而這個模型套用在人口數較少的地區時,參數估計值有明顯偏誤,有不少學者提出修正作法,但使用時仍有不少限制。另外,Lee-Carter模型有三種參數估計方法:奇異值分解法(SVD法)、近似法、最大概似估計法(MLE法),這三種方法各有特色,但過去並無相關研究比較這些方法在人數少時的估計結果。\n有鑑於此,本研究以比較Lee-Carter模型三種估計方法為目標,特別聚焦於人數少時的估計結果,透過電腦模擬比較三種估計方法,評估各方法較為適用的時機。除了探討人口數與估計偏誤之間的關係,我們也提出了可能的調整方式,並與其他常見的修正模型(例如:Li-Lee模型)進行比較。此外,本研究還探討了如何改善在小區域使用Lee-Carter模型的問題,包括降低觀察死亡人數對估計結果的影響,以及納入參考區的小區域死亡率修勻等方法,並評估了不同改善方法對參數估計與預測結果的影響。研究結果顯示隨著人口數的減少,Lee-Carter模型各估計方法的偏誤增大。在三種估計方法中,MLE法的死亡率估計結果最為準確,但即便將死亡人數為零以較小數值(例如:106)替換,該問題仍然無法解決;雖然MLE法在估計上優於其他方法,但預測結果卻最不穩定,尤其是人數較少時。另外,近似法較為穩定,如果結合Partial SMR則更為準確,不論是參數估計或預測的結果都最佳。; Life expectancy of Taiwanese residents continues to increase and there are no signs of slowing down. Taiwan’s elderly population is expected to increase significantly and forecasting life expectancy is important for policy planning. Mortality models are often used to predict life expectancy and Lee-Carter is popular choice. Past studies show that it is fairly stable and has high estimation the accuracy in the case of large population. However, it would create biased estimation if the model is applied to the case of small population. Many scholars proposed modification models to correct the bias but they still have quite a few limitations in usage. Note that the Lee-Carter model has three parameter estimation methods: singular value decomposition (SVD), approximation, and maximum likelihood estimation (MLE). These estimation methods have their own advantages, but no studies have compared the differences between them in the case of small population.\nThe purpose of this study is to compare the three estimation methods of the Lee-Carter model, especially focusing on the small population case, and evaluate the estimation results through computer simulation. We want to explore the relationship between population size and estimation bias and propose modification methods, in order to compare them with other common correction models (e.g., Li-Lee model). We found that the bias of all estimation methods of the Lee-Carter model increases as the population size decreases. The estimation results of the MLE method are the most accurate but it becomes unstable when the population size is small. Even using a small value (for example: 10-6) to replace zero deaths, the estimation results of MLE are still the least stable and accurate. On the other hand, the approximation method is relatively stable, and it will be more accurate if combined with Partial SMR, and the results of both parameter estimation and prediction are the best.
描述: 碩士; 國立政治大學; 統計學系; 1103540222023-09-01T06:57:30Z