應用主成分分析於 LSTM 孿生神經網路權重壓縮之效能評估 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	應用主成分分析於 LSTM 孿生神經網路權重壓縮之效能評估 Evaluating the Performance of PCA-Based Weight Compression for LSTM Siamese Neural Networks
作者	樂沂晨 Yueh, Yi-Chen
貢獻者	周珮婷 Chou, Elizabeth P. 樂沂晨 Yueh, Yi-Chen
關鍵詞	二元分類文本分析模型壓縮孿生神經網路長短期記憶模型主成分分析 Binary Classification Text Analysis Model Compression Siamese Neural Network Long Short-Term Memory (LSTM) Principal Component Analysis(PCA)
日期	2025
上傳時間	1-Jul-2025 15:03:17 (UTC+8)
摘要	孿生神經網路為一種監督式學習的神經網路，透過共享權重的雙子網路計算輸入對的相似度，廣泛應用於語意相似度判斷、人臉辨識與醫學影像比對等任務。儘管深度神經網路擁有較高的效能，實際應用時往往伴隨高參數量與龐大計算成本，特別是在資源受限的裝置上，若未進行適當優化，易面臨運算效能瓶頸。然而，神經網路雖展現出色的表現能力，其效能卻高度依賴於隱藏層中神經元數量的設定。神經元數量過多可能導致過擬合與運算資源浪費，過少則可能限制模型對特徵的學習能力。為此，本研究提出一種基於主成分分析（Principal Component Analysis, PCA）的方法，評估神經元輸出資訊的冗餘程度，進而輔助選擇適當的神經元數量。實驗結果顯示，此方法能在維持模型效能的同時，有效簡化模型結構，提供具參考價值的神經元配置建議。本研究建構一個基於長短期記憶模型的孿生神經網路架構，針對文本分析語意相似度後進行二分類。同時，為了兼顧模型效能、減少冗餘資訊並提升運行效率，採用主成分分析對 LSTM 各門控進行壓縮，探討不同降維程度對模型準確性與語意保留能力的影響，評估是否能在降低計算負擔的同時維持語意匹配效能，進而提升模型在實務應用中的可行性與推論效率。 Siamese Neural Networks, supervised networks employing shared-weight twins, excel in tasks like semantic similarity and recognition. Despite deep networks' power, their high parameter count and computational cost pose challenges, especially on limited-resource devices. Performance also hinges on hidden layer neuron count; too many cause overfitting, too few limit learning. This study introduces a Principal Component Analysis (PCA)-based method to assess neuron output redundancy, aiding in optimal neuron selection. Experiments show this approach simplifies models while preserving performance, offering neuron configuration guidance. We build an LSTM-based Siamese network for binary semantic similarity classification. To balance performance, reduce redundancy, and enhance efficiency, we apply PCA to compress LSTM gates. In this study, we analyze how varying dimensionality reduction impacts accuracy and semantic retention, evaluating if computational reduction can maintain semantic matching, thus improving the model's practical applicability and inference efficiency.
參考文獻	Abadi, M.,Barham, P.,Chen,J.,Chen,Z.,Davis,A.,Dean,J.,Devin,M.,Ghemawat,S.,Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., … Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. https://arxiv.org/abs/1605.08695 Alyozbaky, R., & Alanezi, M. (2023). Detection and analyzing phishing emails using nlp techniques. Detection and Analyzing Phishing Emails Using NLP Techniques, 1–6. https://doi.org/10.1109/HORA58378.2023.10156738 Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. https://arxiv.org/abs/1409.0473 Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. Proceedings of the IEEE International Conference on Neural Networks, 1183–1188. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1993). Signature verification using a ”siamese” time delay neural network. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6). Morgan-Kaufmann. https://proceedings.neurips.cc/paper_files/paper/1993/file/288cc0ff022877bd3df94bc9360b9c5d-Paper.pdf Choi, H.-S. (2024). Simple siamese model with long short-term memory for user authentication with field-programmable gate arrays. Electronics, 13(13). https://doi.org/10.3390/electronics13132584 Costanti, F., Cappelli, I., Fort, A., Ceroni, E. G., & Bianchini, M. (2024). Lstm-based Siamese networks for fault detection in meteorological time series data. 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), 906–911. https://api.semanticscholar.org/CorpusID:275019516 Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recurrent neural networks for offline handwriting recognition. 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 279–284. https://doi.org/10.1109/ICFHR.2014.51 Du, W., Fang, M., & Shen, M. (2017). Siamese convolutional neural networks for authorship verification. UNKNOWN. https://api.semanticscholar.org/CorpusID:42153984 Garg, I., Panda, P., & Roy, K. (2020). A low effort approach to structured cnn design using pca. IEEE Access, 8, 1347–1360. https://doi.org/10.1109/ACCESS.2019.2961960 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed,A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531 Hochreiter, S., & Schmidhuber, J. (1995). Long short-term memory (tech. rep. No. FKI-20795). Department of Fakultät für Informatik, Technical University of Munich. Munich, Germany. Hossain,E., Sharif, O., Hoque,M.,&Sarker,I.(2020).SentiLSTM:ADeepLearningApproach for Sentiment Analysis of Restaurant Reviews. Proceedings of the 20th International Conference on Hybrid Intelligent Systems (HIS). Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. Siamese Neural Networks for One-shot Image Recognition. Mueller, J., & Thyagarajan, A. (2016). Siamese recurrent architectures for learning sentence similarity. ProceedingsoftheThirtiethAAAIConferenceonArtificialIntelligence,27862792. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. https://arxiv.org/abs/1211.5063 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vander plas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2018). Scikit-learn: Machine learning in python. https://arxiv.org/abs/1201.0490 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162 Pontes, E. L., Huet, S., Linhares, A. C., & Torres-Moreno, J.-M. (2018). Predicting the semantic textual similarity with siamese cnn and lstm. https://arxiv.org/abs/1810.10641 Qi, H., Cao, J., Chen, S., & Zhou, J. (2023). Compressing recurrent neural network models through principal component analysis. Statistics and Its Interface, 16, 397–407. https://doi.org/10.4310/22-SII727 Shih, C.-H., Yan, B.-C., Liu, S.-H., & Chen, B. (2017). Investigating siamese lstm networks for text categorization. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 641–646. https://doi.org/10.1109/APSIPA.2017.8282104 Srivastava, N. (2013). Improving neural networks with dropout [Master’s thesis]. University of Toronto. https://www.cs.toronto.edu/~hinton/absps/Srivastava-thesis.pdf
描述	碩士國立政治大學統計學系 112354003
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112354003
資料類型	thesis

dc.contributor.advisor	周珮婷	zh_TW
dc.contributor.advisor	Chou, Elizabeth P.	en_US
dc.contributor.author (Authors)	樂沂晨	zh_TW
dc.contributor.author (Authors)	Yueh, Yi-Chen	en_US
dc.creator (作者)	樂沂晨	zh_TW
dc.creator (作者)	Yueh, Yi-Chen	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Jul-2025 15:03:17 (UTC+8)	-
dc.date.available	1-Jul-2025 15:03:17 (UTC+8)	-
dc.date.issued (上傳時間)	1-Jul-2025 15:03:17 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112354003	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/157808	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	112354003	zh_TW
dc.description.abstract (摘要)	孿生神經網路為一種監督式學習的神經網路，透過共享權重的雙子網路計算輸入對的相似度，廣泛應用於語意相似度判斷、人臉辨識與醫學影像比對等任務。儘管深度神經網路擁有較高的效能，實際應用時往往伴隨高參數量與龐大計算成本，特別是在資源受限的裝置上，若未進行適當優化，易面臨運算效能瓶頸。然而，神經網路雖展現出色的表現能力，其效能卻高度依賴於隱藏層中神經元數量的設定。神經元數量過多可能導致過擬合與運算資源浪費，過少則可能限制模型對特徵的學習能力。為此，本研究提出一種基於主成分分析（Principal Component Analysis, PCA）的方法，評估神經元輸出資訊的冗餘程度，進而輔助選擇適當的神經元數量。實驗結果顯示，此方法能在維持模型效能的同時，有效簡化模型結構，提供具參考價值的神經元配置建議。本研究建構一個基於長短期記憶模型的孿生神經網路架構，針對文本分析語意相似度後進行二分類。同時，為了兼顧模型效能、減少冗餘資訊並提升運行效率，採用主成分分析對 LSTM 各門控進行壓縮，探討不同降維程度對模型準確性與語意保留能力的影響，評估是否能在降低計算負擔的同時維持語意匹配效能，進而提升模型在實務應用中的可行性與推論效率。	zh_TW
dc.description.abstract (摘要)	Siamese Neural Networks, supervised networks employing shared-weight twins, excel in tasks like semantic similarity and recognition. Despite deep networks' power, their high parameter count and computational cost pose challenges, especially on limited-resource devices. Performance also hinges on hidden layer neuron count; too many cause overfitting, too few limit learning. This study introduces a Principal Component Analysis (PCA)-based method to assess neuron output redundancy, aiding in optimal neuron selection. Experiments show this approach simplifies models while preserving performance, offering neuron configuration guidance. We build an LSTM-based Siamese network for binary semantic similarity classification. To balance performance, reduce redundancy, and enhance efficiency, we apply PCA to compress LSTM gates. In this study, we analyze how varying dimensionality reduction impacts accuracy and semantic retention, evaluating if computational reduction can maintain semantic matching, thus improving the model's practical applicability and inference efficiency.	en_US
dc.description.tableofcontents	第一章緒論 — 頁碼 1 第一節研究背景與動機 — 頁碼 1 第二節研究目的 — 頁碼 2 第二章文獻探討 — 頁碼 3 第一節長短期記憶模型 — 頁碼 3 一、遞迴神經網路 — 頁碼 3 二、長短期記憶模型 — 頁碼 3 第二節孿生神經網路 — 頁碼 7 第三節神經網路的剪枝 — 頁碼 10 第四節主成分分析 — 頁碼 13 第三章研究方法 — 頁碼 15 第一節訓練流程圖 — 頁碼 15 第二節孿生神經網路 — 頁碼 15 第三節成對資料 — 頁碼 16 第四節詞嵌入層 — 頁碼 17 第五節特徵萃取架構設計 — 頁碼 17 第六節主成分分析應用於孿生神經網路 — 頁碼 20 一、PCA 降維 — 頁碼 20 二、各門控主成分數量 — 頁碼 20 第七節二元交叉熵損失 — 頁碼 21 第四章研究結果 — 頁碼 22 第一節數據集介紹 — 頁碼 22 一、Disaster Tweets Dataset — 頁碼 22 二、AG News Classification Dataset — 頁碼 22 三、Sentiment Tweet Dataset — 頁碼 22 四、Suicide and Depression Detection — 頁碼 23 第二節實驗結果 — 頁碼 24 一、Disaster Tweets Dataset — 頁碼 25 二、AG News Classification Dataset — 頁碼 31 三、Sentiment Tweet Dataset — 頁碼 37 四、Suicide and Depression Detection — 頁碼 42 第五章結論 — 頁碼 48 第一節總結 — 頁碼 48 第二節研究限制與未來展望 — 頁碼 49 參考文獻 — 頁碼 50	zh_TW
dc.format.extent	2756239 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112354003	en_US
dc.subject (關鍵詞)	二元分類	zh_TW
dc.subject (關鍵詞)	文本分析	zh_TW
dc.subject (關鍵詞)	模型壓縮	zh_TW
dc.subject (關鍵詞)	孿生神經網路	zh_TW
dc.subject (關鍵詞)	長短期記憶模型	zh_TW
dc.subject (關鍵詞)	主成分分析	zh_TW
dc.subject (關鍵詞)	Binary Classification	en_US
dc.subject (關鍵詞)	Text Analysis	en_US
dc.subject (關鍵詞)	Model Compression	en_US
dc.subject (關鍵詞)	Siamese Neural Network	en_US
dc.subject (關鍵詞)	Long Short-Term Memory (LSTM)	en_US
dc.subject (關鍵詞)	Principal Component Analysis(PCA)	en_US
dc.title (題名)	應用主成分分析於 LSTM 孿生神經網路權重壓縮之效能評估	zh_TW
dc.title (題名)	Evaluating the Performance of PCA-Based Weight Compression for LSTM Siamese Neural Networks	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Abadi, M.,Barham, P.,Chen,J.,Chen,Z.,Davis,A.,Dean,J.,Devin,M.,Ghemawat,S.,Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., … Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. https://arxiv.org/abs/1605.08695 Alyozbaky, R., & Alanezi, M. (2023). Detection and analyzing phishing emails using nlp techniques. Detection and Analyzing Phishing Emails Using NLP Techniques, 1–6. https://doi.org/10.1109/HORA58378.2023.10156738 Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. https://arxiv.org/abs/1409.0473 Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. Proceedings of the IEEE International Conference on Neural Networks, 1183–1188. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1993). Signature verification using a ”siamese” time delay neural network. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6). Morgan-Kaufmann. https://proceedings.neurips.cc/paper_files/paper/1993/file/288cc0ff022877bd3df94bc9360b9c5d-Paper.pdf Choi, H.-S. (2024). Simple siamese model with long short-term memory for user authentication with field-programmable gate arrays. Electronics, 13(13). https://doi.org/10.3390/electronics13132584 Costanti, F., Cappelli, I., Fort, A., Ceroni, E. G., & Bianchini, M. (2024). Lstm-based Siamese networks for fault detection in meteorological time series data. 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), 906–911. https://api.semanticscholar.org/CorpusID:275019516 Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recurrent neural networks for offline handwriting recognition. 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 279–284. https://doi.org/10.1109/ICFHR.2014.51 Du, W., Fang, M., & Shen, M. (2017). Siamese convolutional neural networks for authorship verification. UNKNOWN. https://api.semanticscholar.org/CorpusID:42153984 Garg, I., Panda, P., & Roy, K. (2020). A low effort approach to structured cnn design using pca. IEEE Access, 8, 1347–1360. https://doi.org/10.1109/ACCESS.2019.2961960 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed,A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531 Hochreiter, S., & Schmidhuber, J. (1995). Long short-term memory (tech. rep. No. FKI-20795). Department of Fakultät für Informatik, Technical University of Munich. Munich, Germany. Hossain,E., Sharif, O., Hoque,M.,&Sarker,I.(2020).SentiLSTM:ADeepLearningApproach for Sentiment Analysis of Restaurant Reviews. Proceedings of the 20th International Conference on Hybrid Intelligent Systems (HIS). Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. Siamese Neural Networks for One-shot Image Recognition. Mueller, J., & Thyagarajan, A. (2016). Siamese recurrent architectures for learning sentence similarity. ProceedingsoftheThirtiethAAAIConferenceonArtificialIntelligence,27862792. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. https://arxiv.org/abs/1211.5063 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vander plas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2018). Scikit-learn: Machine learning in python. https://arxiv.org/abs/1201.0490 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162 Pontes, E. L., Huet, S., Linhares, A. C., & Torres-Moreno, J.-M. (2018). Predicting the semantic textual similarity with siamese cnn and lstm. https://arxiv.org/abs/1810.10641 Qi, H., Cao, J., Chen, S., & Zhou, J. (2023). Compressing recurrent neural network models through principal component analysis. Statistics and Its Interface, 16, 397–407. https://doi.org/10.4310/22-SII727 Shih, C.-H., Yan, B.-C., Liu, S.-H., & Chen, B. (2017). Investigating siamese lstm networks for text categorization. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 641–646. https://doi.org/10.1109/APSIPA.2017.8282104 Srivastava, N. (2013). Improving neural networks with dropout [Master’s thesis]. University of Toronto. https://www.cs.toronto.edu/~hinton/absps/Srivastava-thesis.pdf	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM