VODKA-Score: evaluating students’ philosophical understanding in allegorical reflections via LLM-distille... | Publication | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

html(30)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	VODKA-Score: evaluating students’ philosophical understanding in allegorical reflections via LLM-distilled QA models
作者	呂欣澤 Hsu, Tiffany T. Y.;Lu, Owen H. T.
貢獻者	創國學士班
關鍵詞	Pedagogical issues; evaluation methodologies; data science applications in education; large-language models; VODKA-Score
日期	2026-02
上傳時間	1-Apr-2026 16:37:16 (UTC+8)
摘要	Post-class reflections are a valuable tool for promoting deeper learning and metacognitive development. However, their open-ended and interpretive nature, particularly in subjects like philosophy, makes timely and consistent assessment difficult. This lack of immediate feedback may contribute to learning anxiety and reduce the effectiveness of reflection as a formative practice. In this study, we assume that when a student’s reflection is sufficiently complete, it should provide enough semantic information to answer related quiz questions. Based on this idea, we introduce the VODKA-Score, a question-answering (QA) based metric that serves as a proxy indicator, evaluating student understanding by identifying answers within their reflections in response to course-related queries, along with DistilledPlatoBERT, a fine-tuned semantic model distilled from five large language models (LLMs) to generate accurate VODKA-Scores. Experiments with 107 students in a philosophy course show that the best-performing model achieved an accuracy between 0.87 and 0.99 and aligned closely with human scoring under specific conditions, reaching a Cohen’s Kappa (κ) above 0.6 and a Spearman correlation (ρ) above 0.5, both statistically significant. Findings suggest that QA-based scoring models can support efficient evaluation in students writing. This study was conducted in a Chinese-language setting and under a fixed reflective writing task, including a predefined prompt, word-count range, writing time, and True/False-based post-test. These conditions limit the cross-linguistic generalizability of the VODKA-Score and should be examined in further studies.
關聯	Interactive Learning Environments, pp.1-17
資料類型	article
DOI	https://doi.org/10.1080/10494820.2026.2623468

dc.contributor	創國學士班
dc.creator (作者)	呂欣澤
dc.creator (作者)	Hsu, Tiffany T. Y.;Lu, Owen H. T.
dc.date (日期)	2026-02
dc.date.accessioned	1-Apr-2026 16:37:16 (UTC+8)	-
dc.date.available	1-Apr-2026 16:37:16 (UTC+8)	-
dc.date.issued (上傳時間)	1-Apr-2026 16:37:16 (UTC+8)	-
dc.identifier.uri (URI)	https://ah.lib.nccu.edu.tw/item?item_id=181860	-
dc.description.abstract (摘要)	Post-class reflections are a valuable tool for promoting deeper learning and metacognitive development. However, their open-ended and interpretive nature, particularly in subjects like philosophy, makes timely and consistent assessment difficult. This lack of immediate feedback may contribute to learning anxiety and reduce the effectiveness of reflection as a formative practice. In this study, we assume that when a student’s reflection is sufficiently complete, it should provide enough semantic information to answer related quiz questions. Based on this idea, we introduce the VODKA-Score, a question-answering (QA) based metric that serves as a proxy indicator, evaluating student understanding by identifying answers within their reflections in response to course-related queries, along with DistilledPlatoBERT, a fine-tuned semantic model distilled from five large language models (LLMs) to generate accurate VODKA-Scores. Experiments with 107 students in a philosophy course show that the best-performing model achieved an accuracy between 0.87 and 0.99 and aligned closely with human scoring under specific conditions, reaching a Cohen’s Kappa (κ) above 0.6 and a Spearman correlation (ρ) above 0.5, both statistically significant. Findings suggest that QA-based scoring models can support efficient evaluation in students writing. This study was conducted in a Chinese-language setting and under a fixed reflective writing task, including a predefined prompt, word-count range, writing time, and True/False-based post-test. These conditions limit the cross-linguistic generalizability of the VODKA-Score and should be examined in further studies.
dc.format.extent	109 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Interactive Learning Environments, pp.1-17
dc.subject (關鍵詞)	Pedagogical issues; evaluation methodologies; data science applications in education; large-language models; VODKA-Score
dc.title (題名)	VODKA-Score: evaluating students’ philosophical understanding in allegorical reflections via LLM-distilled QA models
dc.type (資料類型)	article
dc.identifier.doi (DOI)	10.1080/10494820.2026.2623468
dc.doi.uri (DOI)	https://doi.org/10.1080/10494820.2026.2623468