泰國學生學習華語之偏誤分析：跨年齡學習語料庫 (II) | Publication | NCCU Academic Hub

Publications-NSC Projects

Article View/Open

html(489)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

No doi shows Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	泰國學生學習華語之偏誤分析：跨年齡學習語料庫 (II)
其他題名	The Error Analysis of Thai Students Learning Taiwan Mandarin: a Corpus-Driven and Cross-Sectional Study (Ii)
作者	萬依萍;郭怡君
貢獻者	語言所
關鍵詞	跨年齡長期觀察; 泰國學生華語學習; 華語詞彙與音韻韻律結構; 華語詞彙與句法; 偏誤分析; 語料庫 cross-sectional observational study; Thai speakers learning Mandarin; lexical and phonological-rhythmic relationship; lexical and syntactic components; error analysis; corpus
日期	2022-10
上傳時間	8-Mar-2024 14:13:14 (UTC+8)
摘要	本案採跨年齡層的量化研究，延續收集母語為泰語的學習者（4歲至18歲）學習華語的語言偏誤類別。第一年計畫延續目前的研究方法：著重詞彙及語音的偏誤；以詞彙為出發點，往下端深研音節數量、單/雙音節詞、多音節詞、音節結構、子音、母音、聲調、音節韻律；詞彙偏誤紀錄泰國學生的單詞、雙詞結合、詞彙數量、語意內容與分類及相關詞義。第二年則將研究語言上層詞彙銜接的句法詞組，紀錄句法結構的偏誤，短句錯誤次序，及需要連接詞連接的複合句型。本計畫涵蓋語音、語意及句法三類，對於華語教學相關領域帶出更深層的研究。本申請計畫主要的研究對象來自於泰國曼谷中華國際學校以母語為泰語的非華裔學生為主。目前經過信效度擷取的偏誤資料達1197筆，發現如下：語音偏誤型態仍舊以音節內單一的元素為主，依序為子音>聲調>母音。語音偏誤佔據96%，詞彙/句法偏誤僅有4%。孩童組及少年組：語音/音韻偏誤並非全然受泰語母語影響，而句法偏誤全部受到泰語為母語的牽制句構。本案將增加兩項實驗：低頻濾波測試聲調標記及辨音實驗測試語音近似度效應，詳細流程將參照Hasegawa-Johnson（2018）對於語音距離概率分析的計算演算方式為主。 The aim of this project is to provide a detailed analysis of error patterns drawn from a cross-sectional observational study by Thai speakers (aged 4 to 18; four groups) learning Taiwan Mandarin as their foreign language at Thai-Chinese International School in Bangkok, Thailand. The novelty of this work is to collect and analyze large-scale errors through a highly reliable corpus and to provide PRAAT acoustic parameters for the assessed error distribution by looking at the frequency and the various patterns involving phonological units. The units involved in lexical errors include parts of speech, content/function words, single words, word combinations, and lexical-semantic relationships (i.e., semantic features, semantically-related associates, or general taxonomies of semantic relatedness). The units involved in syntactic errors include simple sentence structure (noun phrases, verb phrases, questions and negations) as well as complexity of syntactic frames, which contain embedded sentences, compound sentences, serial verb construction, pivotal construction, subject-verb sequences, verb-object sequences, subject-verb-object sequences and among many other conjunction structures. Errors drawn from the children (aged 4-6; 8M3F, N=1197) suggested the following: 1) phonological errors way outnumber lexical errors (96% vs. 4%); 2) Consonant errors are the most common, followed by tone errors and vowel errors; 3) Some phonological errors can be influenced by learners' own creative manipulations regardless of their first language background whereas syntactic errors, especially in word order, so far have entirely honored the learners' syntactic knowledge of their native language. In addition to gathering the data drawn from the observational study and E-learning platform, this two-year project proposal will add on two other experiments: 1) Adjusting low-pass filter down to 500Hz in PRAAT for eliminating the semantic information as a perceptual bias on tone errors; 2) Utilizing a discrimination task by incorporating Wan's (2016) phonetic distances in the distinctive feature set of Taiwan Mandarin and Hasegawa-Johnson's (2018) computing algorithms. His algorithms are based on the Perceptual Assimilation Model (Best at al., 1988, 2009) and mainly deal with data likelihood, i.e., mismatched phonetic distances as good probabilities between errors and targets. All the probabilities and parameters will later be trained by Automatic Speech Recognition (ASR). In this research project, the PI and the research team will mainly collect the data, set the PRAAT package, analyze the error patterns by segmenting the linguistic components in the corpus, and will cover all the phonological errors and segmental-prosodic units. The Co-PI will help analyze the lexical-semantic errors as well as syntactic errors. The associate investigator will work on the computer techniques, speech noise reduction and/or even speech diarization. The PI and the entire researchers are hoping to add a growing body of knowledge and understanding from the corpus involving frequency of error units, error patterns and error distribution made by the Thai speakers, and the findings will be essential for research in language teaching, linguistic research, corpus linguistics, clinical domain, and/or computational linguistics.
關聯	科技部, 計畫編號: MOST108-2410-H004-098-MY2, 研究期間: 108.08-110.07
資料類型	report

dc.contributor	語言所
dc.creator (作者)	萬依萍;郭怡君
dc.date (日期)	2022-10
dc.date.accessioned	8-Mar-2024 14:13:14 (UTC+8)	-
dc.date.available	8-Mar-2024 14:13:14 (UTC+8)	-
dc.date.issued (上傳時間)	8-Mar-2024 14:13:14 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/150420	-
dc.description.abstract (摘要)	本案採跨年齡層的量化研究，延續收集母語為泰語的學習者（4歲至18歲）學習華語的語言偏誤類別。第一年計畫延續目前的研究方法：著重詞彙及語音的偏誤；以詞彙為出發點，往下端深研音節數量、單/雙音節詞、多音節詞、音節結構、子音、母音、聲調、音節韻律；詞彙偏誤紀錄泰國學生的單詞、雙詞結合、詞彙數量、語意內容與分類及相關詞義。第二年則將研究語言上層詞彙銜接的句法詞組，紀錄句法結構的偏誤，短句錯誤次序，及需要連接詞連接的複合句型。本計畫涵蓋語音、語意及句法三類，對於華語教學相關領域帶出更深層的研究。本申請計畫主要的研究對象來自於泰國曼谷中華國際學校以母語為泰語的非華裔學生為主。目前經過信效度擷取的偏誤資料達1197筆，發現如下：語音偏誤型態仍舊以音節內單一的元素為主，依序為子音>聲調>母音。語音偏誤佔據96%，詞彙/句法偏誤僅有4%。孩童組及少年組：語音/音韻偏誤並非全然受泰語母語影響，而句法偏誤全部受到泰語為母語的牽制句構。本案將增加兩項實驗：低頻濾波測試聲調標記及辨音實驗測試語音近似度效應，詳細流程將參照Hasegawa-Johnson（2018）對於語音距離概率分析的計算演算方式為主。
dc.description.abstract (摘要)	The aim of this project is to provide a detailed analysis of error patterns drawn from a cross-sectional observational study by Thai speakers (aged 4 to 18; four groups) learning Taiwan Mandarin as their foreign language at Thai-Chinese International School in Bangkok, Thailand. The novelty of this work is to collect and analyze large-scale errors through a highly reliable corpus and to provide PRAAT acoustic parameters for the assessed error distribution by looking at the frequency and the various patterns involving phonological units. The units involved in lexical errors include parts of speech, content/function words, single words, word combinations, and lexical-semantic relationships (i.e., semantic features, semantically-related associates, or general taxonomies of semantic relatedness). The units involved in syntactic errors include simple sentence structure (noun phrases, verb phrases, questions and negations) as well as complexity of syntactic frames, which contain embedded sentences, compound sentences, serial verb construction, pivotal construction, subject-verb sequences, verb-object sequences, subject-verb-object sequences and among many other conjunction structures. Errors drawn from the children (aged 4-6; 8M3F, N=1197) suggested the following: 1) phonological errors way outnumber lexical errors (96% vs. 4%); 2) Consonant errors are the most common, followed by tone errors and vowel errors; 3) Some phonological errors can be influenced by learners' own creative manipulations regardless of their first language background whereas syntactic errors, especially in word order, so far have entirely honored the learners' syntactic knowledge of their native language. In addition to gathering the data drawn from the observational study and E-learning platform, this two-year project proposal will add on two other experiments: 1) Adjusting low-pass filter down to 500Hz in PRAAT for eliminating the semantic information as a perceptual bias on tone errors; 2) Utilizing a discrimination task by incorporating Wan's (2016) phonetic distances in the distinctive feature set of Taiwan Mandarin and Hasegawa-Johnson's (2018) computing algorithms. His algorithms are based on the Perceptual Assimilation Model (Best at al., 1988, 2009) and mainly deal with data likelihood, i.e., mismatched phonetic distances as good probabilities between errors and targets. All the probabilities and parameters will later be trained by Automatic Speech Recognition (ASR). In this research project, the PI and the research team will mainly collect the data, set the PRAAT package, analyze the error patterns by segmenting the linguistic components in the corpus, and will cover all the phonological errors and segmental-prosodic units. The Co-PI will help analyze the lexical-semantic errors as well as syntactic errors. The associate investigator will work on the computer techniques, speech noise reduction and/or even speech diarization. The PI and the entire researchers are hoping to add a growing body of knowledge and understanding from the corpus involving frequency of error units, error patterns and error distribution made by the Thai speakers, and the findings will be essential for research in language teaching, linguistic research, corpus linguistics, clinical domain, and/or computational linguistics.
dc.format.extent	116 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	科技部, 計畫編號: MOST108-2410-H004-098-MY2, 研究期間: 108.08-110.07
dc.subject (關鍵詞)	跨年齡長期觀察; 泰國學生華語學習; 華語詞彙與音韻韻律結構; 華語詞彙與句法; 偏誤分析; 語料庫
dc.subject (關鍵詞)	cross-sectional observational study; Thai speakers learning Mandarin; lexical and phonological-rhythmic relationship; lexical and syntactic components; error analysis; corpus
dc.title (題名)	泰國學生學習華語之偏誤分析：跨年齡學習語料庫 (II)
dc.title.alternative (其他題名)	The Error Analysis of Thai Students Learning Taiwan Mandarin: a Corpus-Driven and Cross-Sectional Study (Ii)
dc.type (資料類型)	report