學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 A Hierarchical Topic Analysis Tool to Facilitate Digital Humanities Research
作者 陳志銘
Chen, Chih-Ming
Ho, Szu-Yu;Chang, Chung
貢獻者 圖檔所
關鍵詞 Digital humanities; Topic analysis; Hierarchical topic modelling; Text mining; Information visualization; Digital humanities research platform
日期 2023-01
上傳時間 23-Aug-2022 13:46:44 (UTC+8)
摘要 Purpose
     This study aims to develop a hierarchical topic analysis tool (HTAT) based on hierarchical Latent Dirichelet allocation (hLDA) to support digital humanities research that is associated with the need of topic exploration on the Digital Humanities Platform for Mr. Lo Chia-Lun’s Writings (DHP-LCLW). HTAT can assist humanities scholars on distant reading with analysis of hierarchical text topics, through classifying time-stamped texts into multiple historical eras, conducting hierarchical topic modeling (HTM) according to the texts from different eras and presenting through visualization. The comparative network diagram is another function provided to assist humanities scholars in comparing the difference in the topics they wish to explore and to track how the concept of a topic changes over time from a particular perspective. In addition, HTAT can also provide humanities scholars with the feature to view source texts, thus having high potential to be applied in promoting the effectiveness of topic exploration due to simultaneously integrating both the topic exploration functions of distant reading and close reading.
     
     Design/methodology/approach
     This study adopts a counterbalanced experimental design to examine whether there is significant differences in the effectiveness of topic inquiry, the number of relevant topics inquired and the time spent on them when research participants were alternately conducting text exploration using DHP-LCLW with HTAT or DHP-LCLW with Single-layer Topic Analysis Tool (SLTAT). A technology acceptance questionnaire and semi-structured interviews were also conducted to understand the research participants` perception and feelings toward using the two different tools to assist topic inquiry.
     
     Findings
     The experimental results show that DHP-LCLW with HTAT could better assist the research participants, in comparison with DHP-LCLW with SLTAT, to grasp the topic context of the texts from two particular perspectives assigned by this study within a short period. In addition, the results of the interviews revealed that DHP-LCLW with HTAT, in comparison with SLTAT, was able to provide a topic terms that better met research participnats` expectations and needs, and effectively guided them to the corresponding texts for close reading. In the analysis of technology acceptance and interview data, it can be found that the research participants have a high and positive tendency toward using DHP-LCLW with HTAT to assist topic inquiry.
     
     Research limitations/implications
     The Jieba Chinese word segmentation system was used in the Mr. Lo Chia-Lun’s Writings Database in this study, to perform word segmentation on Mr. Lo Chia-Lun’s writing texts for topic modeling based on hLDA. Since Jieba word segmentation system is a lexicon based word segmentation system, it cannot identify new words that have still not been collected in the lexicon well. In this case, the correctness of word segmentation on the target texts will affect the results of hLDA topic modeling, and the effectiveness of HTAT in assisting humanities scholars for topic inquiry.
     
     Practical implications
     An HTAT was developed to support digital humanities research in this study. With HTAT, DHP-LCLW provides hmanities scholars with topic clues from different hierarchical perspectives for textual exploration, and with temporal and comparative network diagrams to assist humanities scholars in tracking the evolution of the topics of specific perspectives over time, to gain a more comprehensive understanding of the overall context of the texts.
     
     Originality/value
     In recent years, topic analysis technology that can automatically extract key topic information from a large amount of texts has been developed rapidly, but the topics generated from traditional topic analysis models like LDA (Latent Dirichelet allocation) make it difficult for users to understand the differences in the topics of texts with different hierarchical levels. Thus, this study proposes HTAT which uses hLDA to build a hierarchical topic tree with a tree-like structure without the need to define the number of topics in advance, enabling humanities scholars to quickly grasp the concept of textual topics and use different hierarchical perspectives for further textual exploration. At the same time, it also provides a combination function of temporal division and comparative network diagram to assist humanities scholars in exploring topics and their changes in different eras, which helps them discover more useful research clues or findings.
關聯 Aslib Journal of Information Management, Vol. 75 No. 1, pp. 1-19.
資料類型 article
DOI https://doi.org/10.1108/AJIM-11-2021-0325
dc.contributor 圖檔所-
dc.creator (作者) 陳志銘-
dc.creator (作者) Chen, Chih-Ming-
dc.creator (作者) Ho, Szu-Yu;Chang, Chung-
dc.date (日期) 2023-01-
dc.date.accessioned 23-Aug-2022 13:46:44 (UTC+8)-
dc.date.available 23-Aug-2022 13:46:44 (UTC+8)-
dc.date.issued (上傳時間) 23-Aug-2022 13:46:44 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/141435-
dc.description.abstract (摘要) Purpose
     This study aims to develop a hierarchical topic analysis tool (HTAT) based on hierarchical Latent Dirichelet allocation (hLDA) to support digital humanities research that is associated with the need of topic exploration on the Digital Humanities Platform for Mr. Lo Chia-Lun’s Writings (DHP-LCLW). HTAT can assist humanities scholars on distant reading with analysis of hierarchical text topics, through classifying time-stamped texts into multiple historical eras, conducting hierarchical topic modeling (HTM) according to the texts from different eras and presenting through visualization. The comparative network diagram is another function provided to assist humanities scholars in comparing the difference in the topics they wish to explore and to track how the concept of a topic changes over time from a particular perspective. In addition, HTAT can also provide humanities scholars with the feature to view source texts, thus having high potential to be applied in promoting the effectiveness of topic exploration due to simultaneously integrating both the topic exploration functions of distant reading and close reading.
     
     Design/methodology/approach
     This study adopts a counterbalanced experimental design to examine whether there is significant differences in the effectiveness of topic inquiry, the number of relevant topics inquired and the time spent on them when research participants were alternately conducting text exploration using DHP-LCLW with HTAT or DHP-LCLW with Single-layer Topic Analysis Tool (SLTAT). A technology acceptance questionnaire and semi-structured interviews were also conducted to understand the research participants` perception and feelings toward using the two different tools to assist topic inquiry.
     
     Findings
     The experimental results show that DHP-LCLW with HTAT could better assist the research participants, in comparison with DHP-LCLW with SLTAT, to grasp the topic context of the texts from two particular perspectives assigned by this study within a short period. In addition, the results of the interviews revealed that DHP-LCLW with HTAT, in comparison with SLTAT, was able to provide a topic terms that better met research participnats` expectations and needs, and effectively guided them to the corresponding texts for close reading. In the analysis of technology acceptance and interview data, it can be found that the research participants have a high and positive tendency toward using DHP-LCLW with HTAT to assist topic inquiry.
     
     Research limitations/implications
     The Jieba Chinese word segmentation system was used in the Mr. Lo Chia-Lun’s Writings Database in this study, to perform word segmentation on Mr. Lo Chia-Lun’s writing texts for topic modeling based on hLDA. Since Jieba word segmentation system is a lexicon based word segmentation system, it cannot identify new words that have still not been collected in the lexicon well. In this case, the correctness of word segmentation on the target texts will affect the results of hLDA topic modeling, and the effectiveness of HTAT in assisting humanities scholars for topic inquiry.
     
     Practical implications
     An HTAT was developed to support digital humanities research in this study. With HTAT, DHP-LCLW provides hmanities scholars with topic clues from different hierarchical perspectives for textual exploration, and with temporal and comparative network diagrams to assist humanities scholars in tracking the evolution of the topics of specific perspectives over time, to gain a more comprehensive understanding of the overall context of the texts.
     
     Originality/value
     In recent years, topic analysis technology that can automatically extract key topic information from a large amount of texts has been developed rapidly, but the topics generated from traditional topic analysis models like LDA (Latent Dirichelet allocation) make it difficult for users to understand the differences in the topics of texts with different hierarchical levels. Thus, this study proposes HTAT which uses hLDA to build a hierarchical topic tree with a tree-like structure without the need to define the number of topics in advance, enabling humanities scholars to quickly grasp the concept of textual topics and use different hierarchical perspectives for further textual exploration. At the same time, it also provides a combination function of temporal division and comparative network diagram to assist humanities scholars in exploring topics and their changes in different eras, which helps them discover more useful research clues or findings.
-
dc.format.extent 105 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Aslib Journal of Information Management, Vol. 75 No. 1, pp. 1-19.-
dc.subject (關鍵詞) Digital humanities; Topic analysis; Hierarchical topic modelling; Text mining; Information visualization; Digital humanities research platform-
dc.title (題名) A Hierarchical Topic Analysis Tool to Facilitate Digital Humanities Research-
dc.type (資料類型) article-
dc.identifier.doi (DOI) 10.1108/AJIM-11-2021-0325-
dc.doi.uri (DOI) https://doi.org/10.1108/AJIM-11-2021-0325-