Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 VODKA-Score: evaluating students’ philosophical understanding in allegorical reflections via LLM-distilled QA models
作者 呂欣澤
Hsu, Tiffany T. Y.;Lu, Owen H. T.
貢獻者 創國學士班
關鍵詞 Pedagogical issues; evaluation methodologies; data science applications in education; large-language models; VODKA-Score
日期 2026-02
上傳時間 1-Apr-2026 16:37:16 (UTC+8)
摘要 Post-class reflections are a valuable tool for promoting deeper learning and metacognitive development. However, their open-ended and interpretive nature, particularly in subjects like philosophy, makes timely and consistent assessment difficult. This lack of immediate feedback may contribute to learning anxiety and reduce the effectiveness of reflection as a formative practice. In this study, we assume that when a student’s reflection is sufficiently complete, it should provide enough semantic information to answer related quiz questions. Based on this idea, we introduce the VODKA-Score, a question-answering (QA) based metric that serves as a proxy indicator, evaluating student understanding by identifying answers within their reflections in response to course-related queries, along with DistilledPlatoBERT, a fine-tuned semantic model distilled from five large language models (LLMs) to generate accurate VODKA-Scores. Experiments with 107 students in a philosophy course show that the best-performing model achieved an accuracy between 0.87 and 0.99 and aligned closely with human scoring under specific conditions, reaching a Cohen’s Kappa (κ) above 0.6 and a Spearman correlation (ρ) above 0.5, both statistically significant. Findings suggest that QA-based scoring models can support efficient evaluation in students writing. This study was conducted in a Chinese-language setting and under a fixed reflective writing task, including a predefined prompt, word-count range, writing time, and True/False-based post-test. These conditions limit the cross-linguistic generalizability of the VODKA-Score and should be examined in further studies.
關聯 Interactive Learning Environments, pp.1-17
資料類型 article
DOI https://doi.org/10.1080/10494820.2026.2623468
dc.contributor 創國學士班
dc.creator (作者) 呂欣澤
dc.creator (作者) Hsu, Tiffany T. Y.;Lu, Owen H. T.
dc.date (日期) 2026-02
dc.date.accessioned 1-Apr-2026 16:37:16 (UTC+8)-
dc.date.available 1-Apr-2026 16:37:16 (UTC+8)-
dc.date.issued (上傳時間) 1-Apr-2026 16:37:16 (UTC+8)-
dc.identifier.uri (URI) https://ah.lib.nccu.edu.tw/item?item_id=181860-
dc.description.abstract (摘要) Post-class reflections are a valuable tool for promoting deeper learning and metacognitive development. However, their open-ended and interpretive nature, particularly in subjects like philosophy, makes timely and consistent assessment difficult. This lack of immediate feedback may contribute to learning anxiety and reduce the effectiveness of reflection as a formative practice. In this study, we assume that when a student’s reflection is sufficiently complete, it should provide enough semantic information to answer related quiz questions. Based on this idea, we introduce the VODKA-Score, a question-answering (QA) based metric that serves as a proxy indicator, evaluating student understanding by identifying answers within their reflections in response to course-related queries, along with DistilledPlatoBERT, a fine-tuned semantic model distilled from five large language models (LLMs) to generate accurate VODKA-Scores. Experiments with 107 students in a philosophy course show that the best-performing model achieved an accuracy between 0.87 and 0.99 and aligned closely with human scoring under specific conditions, reaching a Cohen’s Kappa (κ) above 0.6 and a Spearman correlation (ρ) above 0.5, both statistically significant. Findings suggest that QA-based scoring models can support efficient evaluation in students writing. This study was conducted in a Chinese-language setting and under a fixed reflective writing task, including a predefined prompt, word-count range, writing time, and True/False-based post-test. These conditions limit the cross-linguistic generalizability of the VODKA-Score and should be examined in further studies.
dc.format.extent 109 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Interactive Learning Environments, pp.1-17
dc.subject (關鍵詞) Pedagogical issues; evaluation methodologies; data science applications in education; large-language models; VODKA-Score
dc.title (題名) VODKA-Score: evaluating students’ philosophical understanding in allegorical reflections via LLM-distilled QA models
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.1080/10494820.2026.2623468
dc.doi.uri (DOI) https://doi.org/10.1080/10494820.2026.2623468