學術產出-會議論文
題名 | Exploring the Semantic Representations of Text in Subspaces of Latent Space: A Case Study on Color |
作者 | 蕭舜文; 羅永富 Hsiao, Shun-Wen;Lo, Yung-Fu |
貢獻者 | 資管系 |
關鍵詞 | Latent Space; Semantic Representation; Concept Subspace; Projection Optimization; NLP |
日期 | 2024-12 |
上傳時間 | 12-三月-2025 10:22:06 (UTC+8) |
摘要 | Language models like BERT have advanced the representation of textual semantics in high-dimensional latent spaces, enabling numerous natural language processing applications. However, their capacity to represent domain-specific concepts, such as "color," remains underexplored. This study investigated the semantic representation of text in color concept subspace of latent space. Using embeddings of nearly 1,000 color names from the XKCD color survey generated by BERT, we identified limitations in BERT’s ability to cluster perceptually similar colors. To address this, we proposed a supervised learning approach to project embeddings into a color-specific subspace, isolating and enhancing color semantics. Experimental results demonstrated the methodology’s effectiveness in improving semantic clustering through qualitative and quantitative evaluations. Moreover, our general approach not only explored the concept of color but also provided the possibility of exploring and disentangling semantic subspaces for other domain-specific concepts, contributing to the understanding and manipulation of latent space structures in language models. |
關聯 | Proceeding of IEEE International Conference on Big Data, IEEE, pp.8765-8767 |
資料類型 | conference |
DOI | https://doi.org/10.1109/BigData62323.2024.10825707 |
dc.contributor | 資管系 | - |
dc.creator (作者) | 蕭舜文; 羅永富 | - |
dc.creator (作者) | Hsiao, Shun-Wen;Lo, Yung-Fu | - |
dc.date (日期) | 2024-12 | - |
dc.date.accessioned | 12-三月-2025 10:22:06 (UTC+8) | - |
dc.date.available | 12-三月-2025 10:22:06 (UTC+8) | - |
dc.date.issued (上傳時間) | 12-三月-2025 10:22:06 (UTC+8) | - |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/156149 | - |
dc.description.abstract (摘要) | Language models like BERT have advanced the representation of textual semantics in high-dimensional latent spaces, enabling numerous natural language processing applications. However, their capacity to represent domain-specific concepts, such as "color," remains underexplored. This study investigated the semantic representation of text in color concept subspace of latent space. Using embeddings of nearly 1,000 color names from the XKCD color survey generated by BERT, we identified limitations in BERT’s ability to cluster perceptually similar colors. To address this, we proposed a supervised learning approach to project embeddings into a color-specific subspace, isolating and enhancing color semantics. Experimental results demonstrated the methodology’s effectiveness in improving semantic clustering through qualitative and quantitative evaluations. Moreover, our general approach not only explored the concept of color but also provided the possibility of exploring and disentangling semantic subspaces for other domain-specific concepts, contributing to the understanding and manipulation of latent space structures in language models. | - |
dc.format.extent | 114 bytes | - |
dc.format.mimetype | text/html | - |
dc.relation (關聯) | Proceeding of IEEE International Conference on Big Data, IEEE, pp.8765-8767 | - |
dc.subject (關鍵詞) | Latent Space; Semantic Representation; Concept Subspace; Projection Optimization; NLP | - |
dc.title (題名) | Exploring the Semantic Representations of Text in Subspaces of Latent Space: A Case Study on Color | - |
dc.type (資料類型) | conference | - |
dc.identifier.doi (DOI) | 10.1109/BigData62323.2024.10825707 | - |
dc.doi.uri (DOI) | https://doi.org/10.1109/BigData62323.2024.10825707 | - |