學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 臺灣客語語料之數位化
The Digitalization of Corpus Data in Taiwan Hakka Language
作者 賴惠玲;葉秋杏
Lai, Huei-ling;Yeh, Chiou-shing
貢獻者 英文系
關鍵詞 臺灣客語語料庫; 語料數位化; 授權; 後設資料; 語言典藏
Taiwan Hakka corpus; Digitalization of corpus data; Authorization; Metadata; Language archive
日期 2021-11
上傳時間 28-Jun-2022 13:55:45 (UTC+8)
摘要 本文旨在闡述臺灣客語語料庫之語料數位化,耙梳其流程整體脈絡 並廓清文本授權與客語用字問題。語料作業流程係由「前置作業」與「數 位化及檔案管理」兩大階段串聯,在「前置作業」中包含「語料盤點」、 「語料徵集與授權」兩大步驟;而「數位化及檔案管理」則囊括「語料 建檔與後設資料標註」、「語料數位化與資料清理」(含語料轉寫校訂) 和「語料儲存與管理」三個部分。臺灣客語語料庫的重要性在於其為臺 灣第一個書面語料與口語語料兼具且附口語錄音檔的帶標記語料庫,以 系統化方式收錄臺灣客語六腔語料。藉由臺灣客語語料庫實際建構經 驗,本文希望能發揮「鑒往知來」之效,提供其他專家學者參考,以應 用到臺灣其他語言之語料庫建構,更希冀能為語言學與資訊科學之跨領 域研究開創新機。
This paper lays out the digitization of corpus data in Taiwan Hakka Corpus, resolving the issues of texts authorization and Hakka character at the same time. The main task encompasses two stages: “preprocessing operation” and “digitization of corpus data and document management”. Taiwan Hakka Corpus with both written and spoken varieties (audio recordings available) of Taiwan Hakka language collected in a systematic manner is the first part- of-speech-tagged corpus among Taiwanese native languages. Its construction has taken the initiative in setting a model for corpus construction of other national languages in Taiwan. This paper demonstrates a significant reference for the development of interdisciplinary research on linguistics and computer science.
關聯 全球客家研究, No.17, pp.49-100
資料類型 article
dc.contributor 英文系
dc.creator (作者) 賴惠玲;葉秋杏
dc.creator (作者) Lai, Huei-ling;Yeh, Chiou-shing
dc.date (日期) 2021-11
dc.date.accessioned 28-Jun-2022 13:55:45 (UTC+8)-
dc.date.available 28-Jun-2022 13:55:45 (UTC+8)-
dc.date.issued (上傳時間) 28-Jun-2022 13:55:45 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/140443-
dc.description.abstract (摘要) 本文旨在闡述臺灣客語語料庫之語料數位化,耙梳其流程整體脈絡 並廓清文本授權與客語用字問題。語料作業流程係由「前置作業」與「數 位化及檔案管理」兩大階段串聯,在「前置作業」中包含「語料盤點」、 「語料徵集與授權」兩大步驟;而「數位化及檔案管理」則囊括「語料 建檔與後設資料標註」、「語料數位化與資料清理」(含語料轉寫校訂) 和「語料儲存與管理」三個部分。臺灣客語語料庫的重要性在於其為臺 灣第一個書面語料與口語語料兼具且附口語錄音檔的帶標記語料庫,以 系統化方式收錄臺灣客語六腔語料。藉由臺灣客語語料庫實際建構經 驗,本文希望能發揮「鑒往知來」之效,提供其他專家學者參考,以應 用到臺灣其他語言之語料庫建構,更希冀能為語言學與資訊科學之跨領 域研究開創新機。
dc.description.abstract (摘要) This paper lays out the digitization of corpus data in Taiwan Hakka Corpus, resolving the issues of texts authorization and Hakka character at the same time. The main task encompasses two stages: “preprocessing operation” and “digitization of corpus data and document management”. Taiwan Hakka Corpus with both written and spoken varieties (audio recordings available) of Taiwan Hakka language collected in a systematic manner is the first part- of-speech-tagged corpus among Taiwanese native languages. Its construction has taken the initiative in setting a model for corpus construction of other national languages in Taiwan. This paper demonstrates a significant reference for the development of interdisciplinary research on linguistics and computer science.
dc.format.extent 3289816 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) 全球客家研究, No.17, pp.49-100
dc.subject (關鍵詞) 臺灣客語語料庫; 語料數位化; 授權; 後設資料; 語言典藏
dc.subject (關鍵詞) Taiwan Hakka corpus; Digitalization of corpus data; Authorization; Metadata; Language archive
dc.title (題名) 臺灣客語語料之數位化
dc.title (題名) The Digitalization of Corpus Data in Taiwan Hakka Language
dc.type (資料類型) article