NCCU Academic Hub Community: TIGP

NCCU Academic Hub Community: TIGP https://ah.lib.nccu.edu.tw/handle/140.119/118517 TIGP 2025-12-15T03:41:25Z 探討推薦系統之高階關係影響 https://ah.lib.nccu.edu.tw/handle/140.119/141872 題名: 探討推薦系統之高階關係影響; Exploring High-Order Relations for Recommender Systems Authors: 陳志明; Chen, Chih-Ming 摘要: 推薦系統已經被廣泛的運用在各種現實生活系統之中，這間接說明具實用性的推薦系統研究將會帶給世界更多的影響力，有鑑於此，我們開發了一個推薦系統架構名為SMORe，它不僅只是一個開發工具，而是被設計成具備跟上前沿研究開發可能的架構，基於此架構開發，它讓我們所提出的推薦模型皆能實現高效率且高準確度的預測，完全可與現存其他知名架構競爭，甚至表現地更好。\n在此工作中，我們提出了一系列研究包含: 1) HPE, 2) Hop-Rec, 3) CSE, 4) IPR 等共四種協同過濾模型。這四種模型的共通特色為「利用高階關係改善推薦演算法」，請注意這些並非為獨立的研究，讀者可以透過我們提供的各個理論解釋來理解我們的演算法設計思維，更精簡的說明為，推薦系統相關資料集通常含有用戶與物品之間的關係，而高階關係指的是那些沒有被記錄的連結，在我們的演算法中，HPE利用隨機遊走的方式取得高階鄰居關係，用以融合用戶的異質興趣，Hop-Rec則利用隨機遊走的方式來區分用戶與物品之間的關聯強度，進而設計合適的最佳化方程式，CSE巧妙地利用蒐集到的高階鄰居關係來分群用戶與物品，從而提昇推薦的品質，IPR作為集大成，將常用的點對點協同過濾方程透過高階關係重新打造成邊對邊的協同過濾方程，可用以清楚地解釋為何高階關係可以被有效利用在推薦系統演算法之中。; Recommender system is everywhere in enterprise applications nowadays. This indicates that investigating applicable research has a more significant impact on the real world. In light of this, we developed a recommendation-purpose framework named SMORe. It is not only a toolkit but also a research-capable framework for doing cutting-edge research topics. Based on the framework, the implemented proposed models can achieve high-performance and high-accuracy predictions compared to most existing solutions.\nFor the proposed models, we focus on the topic of high-order relations with recommendation algorithms. Specifically, we present a series of four collaborative filtering models: 1) HPE, 2) Hop-Rec, 3) CSE, and 4) IPR. Their main features are to ‘utilize high-order relation modeling for the recommendation algorithms. Note that they are not independent works. By demonstrating their theoretical analysis, the readers can understand the rationale of our proposals. In brief, a recommendation dataset contains the user-to-item edges. The high-order information modeling is an attempt to make use of the unobserved edges. In short, HPE applies random walks to retrieve high-order neighbor data to better fuse the heterogeneous preferences. Hop-Rec determines the strongness of a high-order user-to-item pair and re-shapes the corresponding loss function. CSE shows a delicate way to cluster the users and items by high-order information and simultaneously keep and improve the recommendation quality. IPR brings the conventional entity-level CF modeling to the interaction-level CF modeling using the concept of high-order relations and finally provides an intuitive explanation about why high-order information can benefit the recommendations. 描述: 博士; 國立政治大學; 社群網路與人智計算國際研究生博士學位學程(TIGP); 104761501 2022-09-02T07:53:48Z 使用圖像和深度學習了解社交互動 https://ah.lib.nccu.edu.tw/handle/140.119/134227 題名: 使用圖像和深度學習了解社交互動; Understanding Social Interaction Using Images and Deep Learning Authors: 艾費瑪; Fatma Said Abousaleh Abdeo 摘要: 人們通常能自然無礙地和他人互動，而社群訊號（social signal）是有效溝通的自然產物。然而如何讓電腦能分析、了解社交互動，並正確展現人類社群訊號的過程，仍舊是社群訊號處理（social signal processing, SSP）領域最大的挑戰之一。社交互動可以透過面對面或網路兩種不同的渠道進行。在面對面的互動中，人們常透過可觀察的非語言行為線索（例如：手勢、臉部表情、聲音表達、肢體動作和人際距離等）來了解社群訊號和行為並與他人互動。基於臉部圖像辨識的社交互動研究近來受到學術界極大重視，這是因為臉部圖像蘊含多樣化的臉部特徵，可以用來傳達關於年齡、性別、情緒和健康狀況的資訊。這些訊息在描述個人特質和社交溝通中扮演了重要的角色，其中，年齡尤其是影響我們日常社交互動最基本的因素之一。因此，根據臉部影像自動估計年齡的研究成為人工智慧領域的一項重要目標。雖然近幾年有巨大進展，但由於臉部樣貌的多變性取決於基因特徵、生活型態、臉部表情以及年齡等因素，這個研究課題仍屬於未解的難題。另一方面，網路互動包含了用戶如何透過社交平台如Facebook、Twitter、Instagram或Flickr等與他人互動。大部分的社交網路允許用戶創造並分享內容，也可以藉由不同的形式（例如：觀看、按讚或留言）與其他用戶創造的內容互動，從而產生大量含有用戶興趣、觀點、日常生活和互動資訊的社交內容。爆炸性成長的社群媒體內容和線上互動的行為，造成少數社交內容得到大量關注、受歡迎，但絕大多數則受到忽視。在社群媒體上不同種類的內容中，圖像已經成為用戶溝通的重要媒介，也導致用戶獲得的觀看次數或社交知名度產生變動。上述現象吸引了電腦視覺和多媒體領域的研究人員的興趣，並探究特定圖像受歡迎的原因，以及如何自動預測其受歡迎程度。然而，因為用戶獨特的偏好及其在社群媒體上互動歷程等其他因素，社群媒體上圖像受歡迎的程度仍然難以衡量、預測和定義。為此，本論文提出了一個架構，用以理解現實和線上世界的社交互動，來解決這些挑戰。首先，本論文探討根據臉部圖像自動估計年齡的問題。傳統估計臉部年齡的方法，透過直接分析臉部資訊（例如：鼻子、嘴巴、眼睛等）來從一個人的照片決定其年紀。然而即使對人類來說，一眼看出某人的年紀本質上仍是一項艱鉅的任務。為了處理這個問題，本論文由人類認知過程發想，提出了一個比較深度學習（comparative deep learning）的架構。藉由比較輸入圖像與選定的參考圖像（基準組），決定那組比較年輕或年長，從而以臉部圖像估算年齡。我們用區域卷積神經網路（region-convolutional neural network, R-CNN）從輸入圖像與參考樣本中擷取臉部特徵。然後，為了估計年齡差距，我們用能量函數（energy function）從全連接層（fully connected layer）獲取資訊，產生了一組代表比較關係（年輕或年長）的建議。最後，在模型的預測階段收集所有建議並依多數決來判斷人的年紀。我們在FG-NET、MORPH和IoG資料集上的實驗結果顯示，我們提出的架構超越目前最頂尖的方法，且進步的幅度分別是在FG-NET的13.24%（平均絕對誤差）、MORPH的23.20%（平均絕對誤差）以及IoG的4.74%（年齡分組分類精準度）。其次，本論文研究社群媒體上圖片受歡迎度預測的問題。隨著社群網路如Flickr、Facebook的興起，用戶常藉由分享他們的生活照片來互動。雖然每分鐘上傳了數十億張圖像到網路，但只有少部分能有超過百萬次的觀看量，其他則完全被忽略。即使是相同用戶上傳的不同照片也不會有相同的觀看數。所以如何預測圖像受歡迎度是一個值得研究的主題，同時也是社群媒體分析的關鍵挑戰。因為這可提供一個瞭解個人喜好以及公眾目光的管道。然而，圖像受歡迎度的關鍵因素，和建立一個能預測社群媒體上圖像歡迎度的模型，依然是未解的難題。為此，本論文提出了一個多模式深度學習模型（multimodal deep learning），該模型藉由與圖像受歡迎度有關的多種視覺和社會特徵，來預測社群媒體上圖像的受歡迎度。本模型使用了兩種CNN，分別學習輸入圖像的高階特徵，並將他們融入一個統一的網路來預測受歡迎度。我們透過一系列對Flickr真實資料集的實驗來評估本模型的效能。實驗結果顯示，本預測模型勝過四個傳統的機器學習演算法、兩個CNN模型和其他最新的方法，效能至少提昇了2.33%（斯皮爾曼等級相關係數）、7.59%（平均絕對誤差）以及14.16%（均方誤差）以上。; Human beings generally have the capability to interact easily with each other without any obvious effort, and social signals are the natural result of this effective communication. The process of providing computers with an equivalent capability that enables them to analyze and understand social interactions, and then properly represent human social signals, remains one of the greatest scientific challenges in the field of social signal processing (SSP). Social interactions can take place in two different ways: face-to-face or cyber. In face-to-face interactions, people commonly use observable nonverbal behavioral cues (e.g., gestures, facial expressions, vocalizations, postures, interpersonal distance, etc.) to understand and interact with the social signals and behavior of others. The problem of recognizing social interactions from face images has recently received significant attention from the research community. This is because facial images have a variety of facial traits that can convey information about an individual’s age, gender, emotions, and physical health. These types of information are known to play a key role both in the description of individuals and social communication. In particular, age is one of the most fundamental attributes that affect our daily social interactions. Automatic age estimation from face images has therefore become a significant task in numerous applications of artificial intelligence. Despite the huge advances in the automatic age estimation from face images in recent years, it remains a challenging problem. This is because of the large variations in facial appearance that result from a number of different factors, including genetic traits, lifestyle, facial expressions, and aging. On the other hand, cyber interactions are related to how users interact with each other through social media websites such as Facebook, Twitter, Instagram, and Flickr. Most social networks allow users to create and share content and interact with other user-generated content in different forms (e.g., by viewing, liking, or commenting). This results in massive amounts of social content that provide information about users’ interests, opinions, daily activities, and interactions. The explosive growth of social media content and the interactive online behaviors between users make only a limited number of social media content attracts a great deal of user attention and become popular, while the vast majority of content is completely ignored. Among the different types of content generated by users on social media, images have become important media for communication between users, resulting in variations in the number of views they receive or their social popularity. This phenomenon has attracted researchers from computer vision and multimedia domains to explore the reasons why certain photos are considered popular and how to predict their popularity automatically. However, it is still difficult to measure, predict, or even define image popularity on social media because it is based on a user’s preferences and many other factors that could affect user’s social interactions on social media websites and lead to the popularity of content. To this end, this dissertation proposes a framework for understanding social interaction in the real and online world to address these challenges. First, this dissertation addresses the problem of automatic age estimation from facial images. The conventional methods for facial age estimation normally determine the age of a person directly from his/her facial image by analyzing some facial information (e.g., nose, mouth, eyes, etc.). This means only the input image is utilized to estimate the person’s age. However, telling someone’s precise age at a glance without any reference information is essentially a challenging task even for humans. To address this problem and inspired by human cognitive processes, this dissertation proposes a comparative deep learning framework that estimates the age from the facial image by comparing the input image with a set of selected reference images (labeled baseline samples) to determine whether the input face is younger or older than each of the baseline samples. A specific deep learning architecture, namely a region-convolutional neural network (R-CNN), is used to extract facial information from both the input image and the baseline samples. Then, an energy function is exploited to aggregate the extracted information from the fully connected layer in order to estimate age comparisons. This results in a set of hints where each hint represents a comparative relationship (younger or older). Finally, the estimation stage aggregates all the set of hints and then votes on the number of hints for each label in order to estimate the person’s age. Therefore, the age of the input person could be estimated by taking the label that received the most votes. The experimental results on the FG-NET, MORPH, and IoG databases demonstrate that the proposed model outperforms compared to the state-of-the-art methods, with a relative improvement of 13.24% (on FG-NET), 23.20% (on MORPH) in terms of mean absolute error, and 4.74% (on IoG) in terms of age group classification accuracy. Second, this dissertation addresses the problem of image popularity prediction on social media websites. With an increasing number of social networks such as Flickr and Facebook, users often interact with each other by sharing photos of their daily lives. Although billions of images are uploaded to the internet every minute, only a few of these images receive millions of views and become popular, while others are completely ignored. Even the different images posted by the same user receive a different number of views. This raises the problem of image popularity prediction, which has become a key challenge in social media analytics, as it offers opportunities to reveal individual preferences and public attention. However, the challenge remains to investigate crucial factors that influence image popularity, as well as modeling and predicting the evolution of image popularity on social media. To this end, this dissertation proposes a multimodal deep learning model that predicts the popularity of images on social media by using various types of visual and social features that are associated with image popularity. The proposed model uses two dedicated CNNs to learn high-level representations separately from the input features and then merges them into a unified network for popularity prediction. The performance of the model was evaluated by performing a series of experiments on a real-world dataset from Flickr. The evaluation results reveal that the proposed prediction model outperforms four traditional machine learning schemes, two CNN-based models, and other state-of-the-art methods, with a relative performance improvement of more than 2.33%, 7.59%, and 14.16% in terms of the Spearman rank correlation coefficient, mean absolute error, and mean squared error, respectively. 描述: 博士; 國立政治大學; 社群網路與人智計算國際研究生博士學位學程(TIGP); 103761506 2021-03-02T07:02:02Z 遞歸及自注意力類神經網路之強健性分析 https://ah.lib.nccu.edu.tw/handle/140.119/131976 題名: 遞歸及自注意力類神經網路之強健性分析; Analysis of the robustness of recurrent and self-attentive neural networks Authors: 謝育倫; Hsieh, Yu-Lun 摘要: 本文主要在驗證目前被廣泛應用的深度學習方法，即利用類神經網路所建構的機器學習模型，在自然語言處理領域中之成效。同時，我們對各式模型進行了一系列的強健性分析，其中主要包含了觀察這些模型對於對抗性（adversarial）輸入擾動之抵抗力。更進一步來說，本文所進行的實驗對象，包含了近期受到許多注目的 Transformer 模型，也就是建構在自我注意力機制之上的一種類神經網路，以及目前常用的，基於長短期記憶 (LSTM)細胞所搭建的遞歸類神經網路等等不同網路架構，觀察其應用於自然語言處理上的結果與差異。在實驗內容上，我們囊括了許多在自然語言處理領域中最常見的工作，例如：文本分類、斷詞及詞類標註、情緒分類、蘊含分析、文件摘要及機器翻譯等。結果發現，基於自我注意力的 Transformer 架構在絕大多數的工作上都有較為優異的表現。除了使用不同網路架構並對其成效進行評估，我們也對輸入之資料加以對抗性擾動，以測試不同模型在可靠度上的差異。另外，我們同時提出一些創新的方法來產生有效的對抗性輸入擾動。更重要的是，我們基於前述實驗結果提出理論上的分析與解釋，以探討不同類神經網路架構之間強健性差異的可能來源。; In this work, we focus on investigating the effectiveness of current deep learning methods, also known as neural network-based models, in the field of natural language processing. Additionally, we conduct robustness analysis of various neural model architectures. We evaluate the neural network`s resistance to adversarial input perturbations, which in essence is replacing the input words so that the model might produce incorrect results or predictions. We compare the differences between various network architectures, including the Transformer network based on the self-attention mechanism, and the commonly employed recurrent neural networks using long short-term memory cells (LSTM). We conduct extensive experiments that include the most common tasks in the field of natural language processing: sentence classification, word segmentation and part-of-speech tagging, sentiment classification, entailment analysis, abstractive document summarization, and machine translation. In the process, we evaluate their effectiveness as compared with other state-of-the-art approaches. We then estimate the robustness of different models against adversarial examples through five attack methods. Most importantly, we propose a series of innovative methods to generate adversarial input perturbations, and devise theoretical analysis from our observations. Finally, we attempt to interpret the differences in robustness between neural network models. 描述: 博士; 國立政治大學; 社群網路與人智計算國際研究生博士學位學程(TIGP); 103761503 2020-09-02T05:22:20Z 以進階生成對抗網路合成擬真資料 https://ah.lib.nccu.edu.tw/handle/140.119/123696 題名: 以進階生成對抗網路合成擬真資料; Realistic data synthesis using enhanced generative adversarial networks Authors: 包諾克; Mrinal Kanti Baowaly 摘要: 真實資料在許多情況下無法取得，或者在時間和金錢方面都太昂貴。這是因為這些資料可能存在隱私和保密問題。在這些情況下，使用合成資料是一個可行的選擇。本研究的主要目的是生成近乎真實的合成電子健康記錄（EHR），以便人們可以自由地使用，進行醫療保健或相關領域的研究。我們提出了兩種合成資料的生成模型，分別稱為具有梯度懲罰的醫學沃瑟斯坦GAN（medWGAN），以及醫學邊界尋求GAN（medBGAN），並且將其表現與現有的醫學GAN（medGAN）進行比較。本研究所提出的模型是基於生成對抗網絡（GAN）的兩種增強方法，即具有梯度懲罰的沃瑟斯坦GAN（WGAN-GP），以及邊界尋求GAN（BGAN）。我們在醫學領域中具有離散特徵（例如，二元和計數）的三個匯總EHR資料集上進行資料合成，分別是MIMIC-III，擴展的MIMIC-III，以及台灣國家健康保險研究資料庫（NHIRD）。首先，我們訓練上述模型並生成合成EHR資料。接著，我們應用統計方法（維度平均值以及柯爾莫哥洛夫-斯米爾諾夫檢定）和兩個機器學習任務（關聯規則挖掘以及預測）來分析和比較模型的表現。綜合分析的結果顯示，與使用medGAN模型相比，本研究所提出的模型在生成近乎真實的合成EHR資料方面是更為有效的。　　我們的模型可用於生成任何近乎真實的合成資料，而不限於醫學領域。為了證明模型的一般性，在醫學領域之外，我們還研究了洛杉磯市警察局的一個匯總的犯罪資料集，進一步證實了本研究所提出的模型在廣泛應用中的能力。我們證明本研究所提出的模型可用於生成具有離散特徵的高品質合成資料，這些資料在統計上是合理的，並且足以用於機器學習任務。我們相信，以提供更好的服務來生成近乎真實的合成資料的角度來看，本研究所提出的模型將在工業和學術研究中起到作用。本研究將有助於消除機密資料的存取限制等障礙，從而加速醫學資訊學、醫療保健或相關領域的發展。; There are many situations when the real data are not available or are too expensive to afford in respect of both time and money. This is because those data may have privacy and confidentiality concerns. In these situations, it is a good alternative to use synthetic data. The primary objective of this study is to generate realistic synthetic electronic health records (EHRs) so that people can use it freely for progressing research in healthcare or related fields. We propose two synthetic data generation models – designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN) – and compare the performances with an existing method medical GAN (medGAN). The proposed models are based on the two enhanced methods of generative adversarial networks (GANs), namely, Wasserstein GAN with gradient penalty (WGAN-GP) and boundary-seeking GAN (BGAN). We perform data synthesis on three aggregated EHR datasets with discrete features (e.g., binary and count) in the medical domain. They are MIMIC-III, extended MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. Firstly, we train the models and generate synthetic EHR data by using these trained models. We then analyze and compare the models’ performance by applying some statistical methods (dimension-wise average and Kolmogorov–Smirnov test) and two machine learning tasks (association rule mining and prediction). The comprehensive analysis of this study shows that the proposed models are more effective in generating realistic synthetic EHR data than those generated using medGAN. Our models can be applied to generate any realistic synthetic data, even beyond the medical domain. To prove the generality of our models, we also investigate an aggregated crime dataset in the City of Los Angeles Police Department apart from the medical domain which confirms our models’ capability to work in a wide range of applications. We prove that the proposed models are suitable for producing high-quality synthetic data with discrete features that are statistically sound and good enough for machine learning tasks. We believe the proposed models will be effective in industry and research from the viewpoint of providing better services in generating realistic synthetic data. This study will help to eliminate barriers including limited access to confidential data and thus accelerate the development of medical informatics, healthcare or related fields. 描述: 博士; 國立政治大學; 社群網路與人智計算國際研究生博士學位學程(TIGP); 104761507 2019-06-03T05:08:37Z