針對KGW風格水印技術在大型語言模型中的輕量化增強方法 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	針對KGW風格水印技術在大型語言模型中的輕量化增強方法 A Lightweight Enhancement for KGW-Style Watermarking in Large Language Models
作者	陳彥邦 Chen, Yen-Pang
貢獻者	郁方<br>洪智鐸 Yu, Fang<br>Hong, Chih-Duo 陳彥邦 Chen, Yen-Pang
關鍵詞	政治大學 LLM水印技術生成式人工智慧機器生成文本偵測 NCCU LLM Watermarking Generative AI Machine-Generated Text Detection
日期	2025
上傳時間	4-Aug-2025 14:26:58 (UTC+8)
摘要	隨著大型語言模型生成流暢且自然文字的能力持續提升，外界對其在假資訊、身分冒用及學術不誠實等方面的濫用問題日益關注。為了標記這類由人工智慧生成的內容，軟性水印技術應運而生，透過在文字生成過程中微幅偏向特定詞彙，提高後續辨識機器生成文本的可能性。然而，現有水印方法在處理低變化性的內容（如程式碼、格式化寫作、重複語句）時效果不佳，主因是可用的詞彙有限，導致偵測訊號薄弱。此外，為保留語句自然度，水印強度通常被設為較低，進一步降低偵測效能。本研究提出一種簡單有效的改進方法，透過收集生成過程中頻繁出現的紅色詞彙或 n-gram，並於偵測時將其排除，以去除對偵測貢獻不大、統計證據不足的高機率片段，強化水印訊號。此方法運算量低，且可應用於任一類型的 KGW 式水印技術。多項實驗顯示，即使在低水印強度下，本方法仍可維持高偵測率，並有效抑制誤判。 The growing capability of large language models (LLMs) has raised concerns over misuse in misinformation, impersonation, and academic dishonesty. Soft watermarking marks AI-generated content by subtly biasing token selection, enabling downstream detection. However, existing methods struggle on low-variation text or under low watermark strength, where detection signals are weak. We propose a lightweight enhancement that filters frequently sampled red tokens or n-grams during detection to amplify the watermark signal. Our method significantly improves detection accuracy under low watermark strength, while maintaining a low false positive rate and remaining compatible with any Kgw-style watermarking scheme.
參考文獻	Aaronson, S. and Kirchner, H. (2022). Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt. Presentation slides. Christ, M., Gunn, S., and Zamir, O. (2023). Undetectable watermarks for language models. Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., Bachani, V., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs. Nature, 634(8035):818–823. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Asso- ciation for Computational Linguistics. He, Z., Zhou, B., Hao, H., Liu, A., Wang, X., Tu, Z., Zhang, Z., and Wang, R. (2024). Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. Hou, A. B., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Durme, B. V., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2024a). A watermark for large language models. Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. (2024b). On the reliability of watermarks for large language models. Kuditipudi, R., Thickstun, J., Hashimoto, T., and Liang, P. (2024). Robust distortion-free watermarks for language models. Lee, T., Hong, S., Ahn, J., Hong, I., Lee, H., Yun, S., Shin, J., and Kim, G. (2024). Who wrote this code? watermarking for code generation. Li, Z. (2025). Bimarker: Enhancing text watermark detection for large language models with bipolar watermarks. Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. (2024a). A semantic invariant robust watermark for large language models. Liu, A., Pan, L., Lu, Y., Li, J., Hu, X., Zhang, X., Wen, L., King, I., Xiong, H., and Yu, P. (2024b). A survey of text watermarking in the era of large language models. ACM Computing Surveys, 57(2):1–36. Lu, Y., Liu, A., Yu, D., Li, J., and King, I. (2024). An entropy-based text watermarking detection method. Miller, G. A. (1994). WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. Pan, L., Liu, A., He, Z., Gao, Z., Zhao, X., Lu, Y., Zhou, B., Liu, S., Hu, X., Wen, L., King, I., and Yu, P. S. (2024). Markllm: An open-source toolkit for llm watermarking. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67. Sun, Z., Du, X., Song, F., and Li, L. (2023). Codemark: Imperceptible watermarking for code datasets against neural code completion models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’23, page 1561–1572. ACM. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Huggingface’s transformers: State-of-the-art natural language processing. Xu, H., Xiang, L., Yang, B., Ma, X., Chen, S., and Li, B. (2025). Tokenmark: A modality- agnostic watermark for pre-trained transformers. Zhao, X., Ananth, P., Li, L., and Wang, Y.-X. (2023). Provable robust watermarking for ai-generated text.
描述	碩士國立政治大學資訊管理學系 112356028
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112356028
資料類型	thesis

dc.contributor.advisor	郁方<br>洪智鐸	zh_TW
dc.contributor.advisor	Yu, Fang<br>Hong, Chih-Duo	en_US
dc.contributor.author (Authors)	陳彥邦	zh_TW
dc.contributor.author (Authors)	Chen, Yen-Pang	en_US
dc.creator (作者)	陳彥邦	zh_TW
dc.creator (作者)	Chen, Yen-Pang	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	4-Aug-2025 14:26:58 (UTC+8)	-
dc.date.available	4-Aug-2025 14:26:58 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2025 14:26:58 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112356028	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/158575	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	112356028	zh_TW
dc.description.abstract (摘要)	隨著大型語言模型生成流暢且自然文字的能力持續提升，外界對其在假資訊、身分冒用及學術不誠實等方面的濫用問題日益關注。為了標記這類由人工智慧生成的內容，軟性水印技術應運而生，透過在文字生成過程中微幅偏向特定詞彙，提高後續辨識機器生成文本的可能性。然而，現有水印方法在處理低變化性的內容（如程式碼、格式化寫作、重複語句）時效果不佳，主因是可用的詞彙有限，導致偵測訊號薄弱。此外，為保留語句自然度，水印強度通常被設為較低，進一步降低偵測效能。本研究提出一種簡單有效的改進方法，透過收集生成過程中頻繁出現的紅色詞彙或 n-gram，並於偵測時將其排除，以去除對偵測貢獻不大、統計證據不足的高機率片段，強化水印訊號。此方法運算量低，且可應用於任一類型的 KGW 式水印技術。多項實驗顯示，即使在低水印強度下，本方法仍可維持高偵測率，並有效抑制誤判。	zh_TW
dc.description.abstract (摘要)	The growing capability of large language models (LLMs) has raised concerns over misuse in misinformation, impersonation, and academic dishonesty. Soft watermarking marks AI-generated content by subtly biasing token selection, enabling downstream detection. However, existing methods struggle on low-variation text or under low watermark strength, where detection signals are weak. We propose a lightweight enhancement that filters frequently sampled red tokens or n-grams during detection to amplify the watermark signal. Our method significantly improves detection accuracy under low watermark strength, while maintaining a low false positive rate and remaining compatible with any Kgw-style watermarking scheme.	en_US
dc.description.tableofcontents	摘要 i Abstract ii Contents iii List of Figures v List of Tables vii 1 Introduction 1 1.1 Watermarking for Large Language Models 4 1.1.1 KGW Watermarking Techniques 4 1.1.2 Entropy-Based Selective Watermarking 4 1.1.3 Entropy-Weighted Watermark Detection (EWD) 5 2 Related Work 7 2.1 Detectability Under Challenging Conditions 8 2.2 Robustness Against Watermark Removal Attacks 9 3 Preliminaries 11 4 Methodology 14 4.1 Signature Framework Overview 14 4.2 Collecting and Filtering n-gram Signatures 15 4.2.1 Signature Collection 15 4.2.2 Watermark Detection 17 4.3 Signature Optimization 17 4.3.1 Greedy Optimization 19 4.3.2 Simulated Annealing Optimization 21 5 Experiments 23 5.1 Experiment Framework 23 5.2 Experiment Settings 24 5.3 Empirical Analysis 25 5.3.1 RQ1: How effective is signature filtering in improving watermark detection performance? 25 5.3.2 RQ2: How does the size of the signature set affect the accuracy of watermark detection? 27 5.3.3 RQ3: How does the choice of n impact the effectiveness of n-gram signature filtering? 28 5.3.4 RQ4: How robust is signature filtering? 35 6 Conclusion 41 Reference 42	zh_TW
dc.format.extent	1686945 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112356028	en_US
dc.subject (關鍵詞)	政治大學	zh_TW
dc.subject (關鍵詞)	LLM水印技術	zh_TW
dc.subject (關鍵詞)	生成式人工智慧	zh_TW
dc.subject (關鍵詞)	機器生成文本偵測	zh_TW
dc.subject (關鍵詞)	NCCU	en_US
dc.subject (關鍵詞)	LLM Watermarking	en_US
dc.subject (關鍵詞)	Generative AI	en_US
dc.subject (關鍵詞)	Machine-Generated Text Detection	en_US
dc.title (題名)	針對KGW風格水印技術在大型語言模型中的輕量化增強方法	zh_TW
dc.title (題名)	A Lightweight Enhancement for KGW-Style Watermarking in Large Language Models	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Aaronson, S. and Kirchner, H. (2022). Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt. Presentation slides. Christ, M., Gunn, S., and Zamir, O. (2023). Undetectable watermarks for language models. Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., Bachani, V., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs. Nature, 634(8035):818–823. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Asso- ciation for Computational Linguistics. He, Z., Zhou, B., Hao, H., Liu, A., Wang, X., Tu, Z., Zhang, Z., and Wang, R. (2024). Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. Hou, A. B., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Durme, B. V., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2024a). A watermark for large language models. Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. (2024b). On the reliability of watermarks for large language models. Kuditipudi, R., Thickstun, J., Hashimoto, T., and Liang, P. (2024). Robust distortion-free watermarks for language models. Lee, T., Hong, S., Ahn, J., Hong, I., Lee, H., Yun, S., Shin, J., and Kim, G. (2024). Who wrote this code? watermarking for code generation. Li, Z. (2025). Bimarker: Enhancing text watermark detection for large language models with bipolar watermarks. Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. (2024a). A semantic invariant robust watermark for large language models. Liu, A., Pan, L., Lu, Y., Li, J., Hu, X., Zhang, X., Wen, L., King, I., Xiong, H., and Yu, P. (2024b). A survey of text watermarking in the era of large language models. ACM Computing Surveys, 57(2):1–36. Lu, Y., Liu, A., Yu, D., Li, J., and King, I. (2024). An entropy-based text watermarking detection method. Miller, G. A. (1994). WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. Pan, L., Liu, A., He, Z., Gao, Z., Zhao, X., Lu, Y., Zhou, B., Liu, S., Hu, X., Wen, L., King, I., and Yu, P. S. (2024). Markllm: An open-source toolkit for llm watermarking. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67. Sun, Z., Du, X., Song, F., and Li, L. (2023). Codemark: Imperceptible watermarking for code datasets against neural code completion models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’23, page 1561–1572. ACM. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Huggingface’s transformers: State-of-the-art natural language processing. Xu, H., Xiang, L., Yang, B., Ma, X., Chen, S., and Li, B. (2025). Tokenmark: A modality- agnostic watermark for pre-trained transformers. Zhao, X., Ananth, P., Li, L., and Wang, Y.-X. (2023). Provable robust watermarking for ai-generated text.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM