Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 針對KGW風格水印技術在大型語言模型中的輕量化增強方法
A Lightweight Enhancement for KGW-Style Watermarking in Large Language Models作者 陳彥邦
Chen, Yen-Pang貢獻者 郁方<br>洪智鐸
Yu, Fang<br>Hong, Chih-Duo
陳彥邦
Chen, Yen-Pang關鍵詞 政治大學
LLM水印技術
生成式人工智慧
機器生成文本偵測
NCCU
LLM Watermarking
Generative AI
Machine-Generated Text Detection日期 2025 上傳時間 4-Aug-2025 14:26:58 (UTC+8) 摘要 隨著大型語言模型生成流暢且自然文字的能力持續提升,外界對其在假資訊、身分冒用及學術不誠實等方面的濫用問題日益關注。為了標記這類由人工智慧生成的內容,軟性水印技術應運而生,透過在文字生成過程中微幅偏向特定詞彙,提高後續辨識機器生成文本的可能性。然而,現有水印方法在處理低變化性的內容(如程式碼、格式化寫作、重複語句)時效果不佳,主因是可用的詞彙有限,導致偵測訊號薄弱。此外,為保留語句自然度,水印強度通常被設為較低,進一步降低偵測效能。本研究提出一種簡單有效的改進方法,透過收集生成過程中頻繁出現的紅色詞彙或 n-gram,並於偵測時將其排除,以去除對偵測貢獻不大、統計證據不足的高機率片段,強化水印訊號。此方法運算量低,且可應用於任一類型的 KGW 式水印技術。多項實驗顯示,即使在低水印強度下,本方法仍可維持高偵測率,並有效抑制誤判。
The growing capability of large language models (LLMs) has raised concerns over misuse in misinformation, impersonation, and academic dishonesty. Soft watermarking marks AI-generated content by subtly biasing token selection, enabling downstream detection. However, existing methods struggle on low-variation text or under low watermark strength, where detection signals are weak. We propose a lightweight enhancement that filters frequently sampled red tokens or n-grams during detection to amplify the watermark signal. Our method significantly improves detection accuracy under low watermark strength, while maintaining a low false positive rate and remaining compatible with any Kgw-style watermarking scheme.參考文獻 Aaronson, S. and Kirchner, H. (2022). Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt. Presentation slides. Christ, M., Gunn, S., and Zamir, O. (2023). Undetectable watermarks for language models. Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., Bachani, V., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs. Nature, 634(8035):818–823. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Asso- ciation for Computational Linguistics. He, Z., Zhou, B., Hao, H., Liu, A., Wang, X., Tu, Z., Zhang, Z., and Wang, R. (2024). Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. Hou, A. B., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Durme, B. V., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2024a). A watermark for large language models. Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. (2024b). On the reliability of watermarks for large language models. Kuditipudi, R., Thickstun, J., Hashimoto, T., and Liang, P. (2024). Robust distortion-free watermarks for language models. Lee, T., Hong, S., Ahn, J., Hong, I., Lee, H., Yun, S., Shin, J., and Kim, G. (2024). Who wrote this code? watermarking for code generation. Li, Z. (2025). Bimarker: Enhancing text watermark detection for large language models with bipolar watermarks. Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. (2024a). A semantic invariant robust watermark for large language models. Liu, A., Pan, L., Lu, Y., Li, J., Hu, X., Zhang, X., Wen, L., King, I., Xiong, H., and Yu, P. (2024b). A survey of text watermarking in the era of large language models. ACM Computing Surveys, 57(2):1–36. Lu, Y., Liu, A., Yu, D., Li, J., and King, I. (2024). An entropy-based text watermarking detection method. Miller, G. A. (1994). WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. Pan, L., Liu, A., He, Z., Gao, Z., Zhao, X., Lu, Y., Zhou, B., Liu, S., Hu, X., Wen, L., King, I., and Yu, P. S. (2024). Markllm: An open-source toolkit for llm watermarking. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67. Sun, Z., Du, X., Song, F., and Li, L. (2023). Codemark: Imperceptible watermarking for code datasets against neural code completion models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’23, page 1561–1572. ACM. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Huggingface’s transformers: State-of-the-art natural language processing. Xu, H., Xiang, L., Yang, B., Ma, X., Chen, S., and Li, B. (2025). Tokenmark: A modality- agnostic watermark for pre-trained transformers. Zhao, X., Ananth, P., Li, L., and Wang, Y.-X. (2023). Provable robust watermarking for ai-generated text. 描述 碩士
國立政治大學
資訊管理學系
112356028資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356028 資料類型 thesis dc.contributor.advisor 郁方<br>洪智鐸 zh_TW dc.contributor.advisor Yu, Fang<br>Hong, Chih-Duo en_US dc.contributor.author (Authors) 陳彥邦 zh_TW dc.contributor.author (Authors) Chen, Yen-Pang en_US dc.creator (作者) 陳彥邦 zh_TW dc.creator (作者) Chen, Yen-Pang en_US dc.date (日期) 2025 en_US dc.date.accessioned 4-Aug-2025 14:26:58 (UTC+8) - dc.date.available 4-Aug-2025 14:26:58 (UTC+8) - dc.date.issued (上傳時間) 4-Aug-2025 14:26:58 (UTC+8) - dc.identifier (Other Identifiers) G0112356028 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158575 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 112356028 zh_TW dc.description.abstract (摘要) 隨著大型語言模型生成流暢且自然文字的能力持續提升,外界對其在假資訊、身分冒用及學術不誠實等方面的濫用問題日益關注。為了標記這類由人工智慧生成的內容,軟性水印技術應運而生,透過在文字生成過程中微幅偏向特定詞彙,提高後續辨識機器生成文本的可能性。然而,現有水印方法在處理低變化性的內容(如程式碼、格式化寫作、重複語句)時效果不佳,主因是可用的詞彙有限,導致偵測訊號薄弱。此外,為保留語句自然度,水印強度通常被設為較低,進一步降低偵測效能。本研究提出一種簡單有效的改進方法,透過收集生成過程中頻繁出現的紅色詞彙或 n-gram,並於偵測時將其排除,以去除對偵測貢獻不大、統計證據不足的高機率片段,強化水印訊號。此方法運算量低,且可應用於任一類型的 KGW 式水印技術。多項實驗顯示,即使在低水印強度下,本方法仍可維持高偵測率,並有效抑制誤判。 zh_TW dc.description.abstract (摘要) The growing capability of large language models (LLMs) has raised concerns over misuse in misinformation, impersonation, and academic dishonesty. Soft watermarking marks AI-generated content by subtly biasing token selection, enabling downstream detection. However, existing methods struggle on low-variation text or under low watermark strength, where detection signals are weak. We propose a lightweight enhancement that filters frequently sampled red tokens or n-grams during detection to amplify the watermark signal. Our method significantly improves detection accuracy under low watermark strength, while maintaining a low false positive rate and remaining compatible with any Kgw-style watermarking scheme. en_US dc.description.tableofcontents 摘要 i Abstract ii Contents iii List of Figures v List of Tables vii 1 Introduction 1 1.1 Watermarking for Large Language Models 4 1.1.1 KGW Watermarking Techniques 4 1.1.2 Entropy-Based Selective Watermarking 4 1.1.3 Entropy-Weighted Watermark Detection (EWD) 5 2 Related Work 7 2.1 Detectability Under Challenging Conditions 8 2.2 Robustness Against Watermark Removal Attacks 9 3 Preliminaries 11 4 Methodology 14 4.1 Signature Framework Overview 14 4.2 Collecting and Filtering n-gram Signatures 15 4.2.1 Signature Collection 15 4.2.2 Watermark Detection 17 4.3 Signature Optimization 17 4.3.1 Greedy Optimization 19 4.3.2 Simulated Annealing Optimization 21 5 Experiments 23 5.1 Experiment Framework 23 5.2 Experiment Settings 24 5.3 Empirical Analysis 25 5.3.1 RQ1: How effective is signature filtering in improving watermark detection performance? 25 5.3.2 RQ2: How does the size of the signature set affect the accuracy of watermark detection? 27 5.3.3 RQ3: How does the choice of n impact the effectiveness of n-gram signature filtering? 28 5.3.4 RQ4: How robust is signature filtering? 35 6 Conclusion 41 Reference 42 zh_TW dc.format.extent 1686945 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356028 en_US dc.subject (關鍵詞) 政治大學 zh_TW dc.subject (關鍵詞) LLM水印技術 zh_TW dc.subject (關鍵詞) 生成式人工智慧 zh_TW dc.subject (關鍵詞) 機器生成文本偵測 zh_TW dc.subject (關鍵詞) NCCU en_US dc.subject (關鍵詞) LLM Watermarking en_US dc.subject (關鍵詞) Generative AI en_US dc.subject (關鍵詞) Machine-Generated Text Detection en_US dc.title (題名) 針對KGW風格水印技術在大型語言模型中的輕量化增強方法 zh_TW dc.title (題名) A Lightweight Enhancement for KGW-Style Watermarking in Large Language Models en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Aaronson, S. and Kirchner, H. (2022). Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt. Presentation slides. Christ, M., Gunn, S., and Zamir, O. (2023). Undetectable watermarks for language models. Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., Bachani, V., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs. Nature, 634(8035):818–823. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Asso- ciation for Computational Linguistics. He, Z., Zhou, B., Hao, H., Liu, A., Wang, X., Tu, Z., Zhang, Z., and Wang, R. (2024). Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. Hou, A. B., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Durme, B. V., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2024a). A watermark for large language models. Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. (2024b). On the reliability of watermarks for large language models. Kuditipudi, R., Thickstun, J., Hashimoto, T., and Liang, P. (2024). Robust distortion-free watermarks for language models. Lee, T., Hong, S., Ahn, J., Hong, I., Lee, H., Yun, S., Shin, J., and Kim, G. (2024). Who wrote this code? watermarking for code generation. Li, Z. (2025). Bimarker: Enhancing text watermark detection for large language models with bipolar watermarks. Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. (2024a). A semantic invariant robust watermark for large language models. Liu, A., Pan, L., Lu, Y., Li, J., Hu, X., Zhang, X., Wen, L., King, I., Xiong, H., and Yu, P. (2024b). A survey of text watermarking in the era of large language models. ACM Computing Surveys, 57(2):1–36. Lu, Y., Liu, A., Yu, D., Li, J., and King, I. (2024). An entropy-based text watermarking detection method. Miller, G. A. (1994). WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. Pan, L., Liu, A., He, Z., Gao, Z., Zhao, X., Lu, Y., Zhou, B., Liu, S., Hu, X., Wen, L., King, I., and Yu, P. S. (2024). Markllm: An open-source toolkit for llm watermarking. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67. Sun, Z., Du, X., Song, F., and Li, L. (2023). Codemark: Imperceptible watermarking for code datasets against neural code completion models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’23, page 1561–1572. ACM. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Huggingface’s transformers: State-of-the-art natural language processing. Xu, H., Xiang, L., Yang, B., Ma, X., Chen, S., and Li, B. (2025). Tokenmark: A modality- agnostic watermark for pre-trained transformers. Zhao, X., Ananth, P., Li, L., and Wang, Y.-X. (2023). Provable robust watermarking for ai-generated text. zh_TW
