學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 加入作答時間之試題反應模型在能力上的研究
A study of the ability after incorporating response time in the item response model
作者 曾定柏
Tseng, Ting-Po
貢獻者 姜志銘<br>宋傳欽
Jiang, Zhi-Ming<br>Song, Chwan-Chin
曾定柏
Tseng, Ting-Po
關鍵詞 試題反應理論
作答時間
評分規則
IRT
Response time
Scoring rule
日期 2020
上傳時間 2-九月-2020 12:14:35 (UTC+8)
摘要 本研究旨在探討作答時間是否適合作為受試者能力值估計的一項因素。本文從制定一種加入作答時間的評分規則為出發點,建立一個包含作答反應與作答時間的模型,再以最大概似估計法估計能力值與難度值,並透過實際數據之計算結果,分析能力估計值在加入作答時間前與後是否有所不同。最後,探討在作答反應組型相同時,能否以此新模型進一步區分受試者能力值的高低。
透過模擬數據進行模型驗證,在估計受試者能力值時,與IRT模式比較,我們發現本文所建立之模型對高能力群之受試者,有較佳的估計精準度,且受試者人數越多,精準度越好。
同時,經過實證分析,本文歸納出以下三項結果:
1.答錯時,使用本文所建立之評分規則的分數與作答時間呈現正相關。
在篩選的題目中,無論是簡單、中等或是困難的試題,我們發現答錯之受試者的能力估計值與作答時間均呈現正相關,因此對於作答時間較短的受試者應給予較少的得分(即扣分較多),而作答時間較長的受試者反而應給予較多的得分(即扣分較少)。
2.作答時間宜採用伽瑪分布。
在篩選的試題中,經由適合度檢定,我們發現部分試題的作答時間並不適合指數分布,宜改用較一般化的伽瑪分布。
3.作答反應組型相同時,以本文所建立之模型能進一步區分受試者的能力值。
無論是古典測驗理論或是試題反應理論,皆無法從作答反應組型相同的試題中區分受試者的能力值,但採用加入作答時間之試題反應模型後,我們不僅可經由這幾道試題進一步區分受試者的能力值,而且其與依全部試題所估計之能力值呈現高度的秩相關。
This study aims to explore whether the response time is an important factor for estimating the abilities of examinees. After giving a scoring rule, which incorporates both item response and response time, we build a new model and can then estimate the ability of any examinee and the difficulty of any item by using the method of maximum likelihood estimation. Through the real data, we compare examinees’ abilities based on the IRT and that based on our new speed-accuracy response model (NSARM). Finally, we explore whether this new model can further distinguish the abilities of two examinees when their response patterns are the same.
Through the simulations, we find that, on ability estimate, our NSARM shows more accurate than IRT model among those examinees with high ability. In addition, it is even more accurate when the number of examinees increases.
After analizing our real data, we further summarize the following three results:
1.When the item is responded incorrectly, the score based on our new model is posi-tively correlated with the response time.
Among the randomly selected items, no mater they are simple, medium or difficult, we found that the estimated abilities of the examinees who incorrectly answered the items, are positively correlated with their response times. Therefore, an examinee taking shorter response time should be given a lower score (i.e., deduct more additional score), and an examinee taking longer response time should be given a higher score (i.e., deduct less additional score).
2.The gamma distribution is more appropriate for modeling the response time.
Using the goodness of fit test, we found that the exponential distribution, which is used by many authors, is not appropriate to model the response time of some items in our data set. However, we further found that the gamma distribution, which is a generalization of the exponential distribution, is appropriate.
3.Our new model can further distinguish the abilities of examinees.
Neither classical test theory nor item response theory can distinguish the abilities of examinees when their response patterns are the same.
參考文獻 中文文獻
(1) 余民寧 (2009)。試題反應理論(IRT)及其應用。臺北:心理出版社股份有限公司。
(2) 陳俊宏、陳淑英 (2010)。試題作答時間在電腦適性測驗上的應用---改善具洩題資訊受試者的能力估計精確性。測驗學刊第五十七輯第四期P.459~483。
(3) 陳茂雄 (2017)。作答時間與答題機率關係的探討~以小數的乘法單元為例。國立台中教育大學,教育資訊與測驗統計研究所。
(4) 黃聖筠、陳淑英 (2009)。「試題作答時間」在洩題偵測上的應用。測驗學刊第五十六輯第四期P.543~571。
(5) 張雅媛 (2007)。融合kernel smoothing 之MMLE法於IRT參數估計之應用。國立台中教育大學,教育測驗統計研究所。
(6) 郭伯臣、謝典佑、吳慧珉、林佳樺 (2012)。一因子高層次試題反應理論模式之評估。測驗學刊第五十九輯第三期P.329~348。
英文文獻
(1) Allen, W. J. and Yen, W. M. (2001). Introduction to measure theory (2nd ed.). Monterey, CA: Brooks/ Cole.
(2) Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
(3) Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, 397-479.
(4) Casella, G., and Berger, L. (1990). Statistical inference. Pacific Grove, CA: Brooks/Cole.
(5) Crocker, L. and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston.
(6) de la Torre, J. and Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement 33 (8), 620-639.
(7) Dennis, I. and Evans, J. (1996). The speed-error trade-off problem in psychometric testing. British Journal of Psychology, 87, 105–129.
(8) Ferrando, P. J., and Lorenzo-Seva, U. (2007). An item-response model incorporating response time data in binary personality items. Applied Psychological Measurement, 31, 525–543.
(9) Gaviria, J. L. (2005). Increase in precision when estimating parameters in computer assisted testing using response times. Quality & Quantity, 39, 45–69.
(10) Guion, R. M. and Ironson, G. H. (1983). Latent trait theory for organizational research. Organizational Behavior and Human Performance, 31, 54-87.
(11) Lord, F. M. (1952). A theory of test scores. Psychometric Monograph, No. 7.
(12) Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
(13) Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
(14) Maris, G. and Van der Maas, H. L. (2012). Speed-Accuracy Response Models: Scoring Rules based on Response Time and Accuracy. Psychometrika, 77(4), 615–633.
(15) Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago:University of Chicago Press.
(16) Roskam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roskam and R. Suck (Eds.), Progress in mathematical psychology (pp. 151–171). Amsterdam: North Holland.
(17) Thissen, D. (1983). Timed testing: An approach using item response theory. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing.(pp. 179–203) New York: Academic Press.
(18) Thurstone, L. L. (1937). Ability, motivation, and speed. Psychometrika, 2, 249–254.
(19) Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1-13.
(20) Van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika,72, 287–308.
(21) Van der Linden, W. J. (2009). Conceptual Issues in Response-Time Modeling. Journal of Educational Measurement, Vol. 46, No. 3, pp. 247–272.
(22) Van der Maas, H. L. and Wagenmakers, E. J. (2005). A psychometric analysis of chess expertise. The American Journal of Psychology, 118(1), 29–60.
(23) Verhelst, N. D., Verstralen, H. H. F. M., and Jansen, M. G. (1997). A logistic model for timelimit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 169–185). New York: Springer.
(24) Wang, T., and Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339.
(25) Wickelgren, W. (1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 41, 67–85.
描述 碩士
國立政治大學
應用數學系
105751018
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105751018
資料類型 thesis
dc.contributor.advisor 姜志銘<br>宋傳欽zh_TW
dc.contributor.advisor Jiang, Zhi-Ming<br>Song, Chwan-Chinen_US
dc.contributor.author (作者) 曾定柏zh_TW
dc.contributor.author (作者) Tseng, Ting-Poen_US
dc.creator (作者) 曾定柏zh_TW
dc.creator (作者) Tseng, Ting-Poen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-九月-2020 12:14:35 (UTC+8)-
dc.date.available 2-九月-2020 12:14:35 (UTC+8)-
dc.date.issued (上傳時間) 2-九月-2020 12:14:35 (UTC+8)-
dc.identifier (其他 識別碼) G0105751018en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131627-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 應用數學系zh_TW
dc.description (描述) 105751018zh_TW
dc.description.abstract (摘要) 本研究旨在探討作答時間是否適合作為受試者能力值估計的一項因素。本文從制定一種加入作答時間的評分規則為出發點,建立一個包含作答反應與作答時間的模型,再以最大概似估計法估計能力值與難度值,並透過實際數據之計算結果,分析能力估計值在加入作答時間前與後是否有所不同。最後,探討在作答反應組型相同時,能否以此新模型進一步區分受試者能力值的高低。
透過模擬數據進行模型驗證,在估計受試者能力值時,與IRT模式比較,我們發現本文所建立之模型對高能力群之受試者,有較佳的估計精準度,且受試者人數越多,精準度越好。
同時,經過實證分析,本文歸納出以下三項結果:
1.答錯時,使用本文所建立之評分規則的分數與作答時間呈現正相關。
在篩選的題目中,無論是簡單、中等或是困難的試題,我們發現答錯之受試者的能力估計值與作答時間均呈現正相關,因此對於作答時間較短的受試者應給予較少的得分(即扣分較多),而作答時間較長的受試者反而應給予較多的得分(即扣分較少)。
2.作答時間宜採用伽瑪分布。
在篩選的試題中,經由適合度檢定,我們發現部分試題的作答時間並不適合指數分布,宜改用較一般化的伽瑪分布。
3.作答反應組型相同時,以本文所建立之模型能進一步區分受試者的能力值。
無論是古典測驗理論或是試題反應理論,皆無法從作答反應組型相同的試題中區分受試者的能力值,但採用加入作答時間之試題反應模型後,我們不僅可經由這幾道試題進一步區分受試者的能力值,而且其與依全部試題所估計之能力值呈現高度的秩相關。
zh_TW
dc.description.abstract (摘要) This study aims to explore whether the response time is an important factor for estimating the abilities of examinees. After giving a scoring rule, which incorporates both item response and response time, we build a new model and can then estimate the ability of any examinee and the difficulty of any item by using the method of maximum likelihood estimation. Through the real data, we compare examinees’ abilities based on the IRT and that based on our new speed-accuracy response model (NSARM). Finally, we explore whether this new model can further distinguish the abilities of two examinees when their response patterns are the same.
Through the simulations, we find that, on ability estimate, our NSARM shows more accurate than IRT model among those examinees with high ability. In addition, it is even more accurate when the number of examinees increases.
After analizing our real data, we further summarize the following three results:
1.When the item is responded incorrectly, the score based on our new model is posi-tively correlated with the response time.
Among the randomly selected items, no mater they are simple, medium or difficult, we found that the estimated abilities of the examinees who incorrectly answered the items, are positively correlated with their response times. Therefore, an examinee taking shorter response time should be given a lower score (i.e., deduct more additional score), and an examinee taking longer response time should be given a higher score (i.e., deduct less additional score).
2.The gamma distribution is more appropriate for modeling the response time.
Using the goodness of fit test, we found that the exponential distribution, which is used by many authors, is not appropriate to model the response time of some items in our data set. However, we further found that the gamma distribution, which is a generalization of the exponential distribution, is appropriate.
3.Our new model can further distinguish the abilities of examinees.
Neither classical test theory nor item response theory can distinguish the abilities of examinees when their response patterns are the same.
en_US
dc.description.tableofcontents 第一章 緒論 4
第一節 研究背景與動機 4
第二節 研究目的與研究問題 5
第二章 文獻回顧 6
第一節 古典測驗理論 6
第二節 試題反應理論 8
第三節 作答時間模型 13
第三章 研究方法 16
第一節 建立模型 16
第二節 研究過程 26
第四章 模型驗證 31
第五章 實證分析 42
第一節 符號負指數時間評分規則之探討 42
第二節 作答反應與作答時間之獨立性 47
第三節 作答時間符合伽瑪分布 49
第四節 不同模型下試題難度值相關性之探討 57
第五節 NSARM與IRT單參數模型在能力值的比較 61
第六節 作答反應組型相同時受試者能力值之探究 67
第六章 結論與建議 71
參考文獻 74
zh_TW
dc.format.extent 4692851 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105751018en_US
dc.subject (關鍵詞) 試題反應理論zh_TW
dc.subject (關鍵詞) 作答時間zh_TW
dc.subject (關鍵詞) 評分規則zh_TW
dc.subject (關鍵詞) IRTen_US
dc.subject (關鍵詞) Response timeen_US
dc.subject (關鍵詞) Scoring ruleen_US
dc.title (題名) 加入作答時間之試題反應模型在能力上的研究zh_TW
dc.title (題名) A study of the ability after incorporating response time in the item response modelen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 中文文獻
(1) 余民寧 (2009)。試題反應理論(IRT)及其應用。臺北:心理出版社股份有限公司。
(2) 陳俊宏、陳淑英 (2010)。試題作答時間在電腦適性測驗上的應用---改善具洩題資訊受試者的能力估計精確性。測驗學刊第五十七輯第四期P.459~483。
(3) 陳茂雄 (2017)。作答時間與答題機率關係的探討~以小數的乘法單元為例。國立台中教育大學,教育資訊與測驗統計研究所。
(4) 黃聖筠、陳淑英 (2009)。「試題作答時間」在洩題偵測上的應用。測驗學刊第五十六輯第四期P.543~571。
(5) 張雅媛 (2007)。融合kernel smoothing 之MMLE法於IRT參數估計之應用。國立台中教育大學,教育測驗統計研究所。
(6) 郭伯臣、謝典佑、吳慧珉、林佳樺 (2012)。一因子高層次試題反應理論模式之評估。測驗學刊第五十九輯第三期P.329~348。
英文文獻
(1) Allen, W. J. and Yen, W. M. (2001). Introduction to measure theory (2nd ed.). Monterey, CA: Brooks/ Cole.
(2) Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
(3) Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, 397-479.
(4) Casella, G., and Berger, L. (1990). Statistical inference. Pacific Grove, CA: Brooks/Cole.
(5) Crocker, L. and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston.
(6) de la Torre, J. and Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement 33 (8), 620-639.
(7) Dennis, I. and Evans, J. (1996). The speed-error trade-off problem in psychometric testing. British Journal of Psychology, 87, 105–129.
(8) Ferrando, P. J., and Lorenzo-Seva, U. (2007). An item-response model incorporating response time data in binary personality items. Applied Psychological Measurement, 31, 525–543.
(9) Gaviria, J. L. (2005). Increase in precision when estimating parameters in computer assisted testing using response times. Quality & Quantity, 39, 45–69.
(10) Guion, R. M. and Ironson, G. H. (1983). Latent trait theory for organizational research. Organizational Behavior and Human Performance, 31, 54-87.
(11) Lord, F. M. (1952). A theory of test scores. Psychometric Monograph, No. 7.
(12) Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
(13) Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
(14) Maris, G. and Van der Maas, H. L. (2012). Speed-Accuracy Response Models: Scoring Rules based on Response Time and Accuracy. Psychometrika, 77(4), 615–633.
(15) Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago:University of Chicago Press.
(16) Roskam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roskam and R. Suck (Eds.), Progress in mathematical psychology (pp. 151–171). Amsterdam: North Holland.
(17) Thissen, D. (1983). Timed testing: An approach using item response theory. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing.(pp. 179–203) New York: Academic Press.
(18) Thurstone, L. L. (1937). Ability, motivation, and speed. Psychometrika, 2, 249–254.
(19) Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1-13.
(20) Van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika,72, 287–308.
(21) Van der Linden, W. J. (2009). Conceptual Issues in Response-Time Modeling. Journal of Educational Measurement, Vol. 46, No. 3, pp. 247–272.
(22) Van der Maas, H. L. and Wagenmakers, E. J. (2005). A psychometric analysis of chess expertise. The American Journal of Psychology, 118(1), 29–60.
(23) Verhelst, N. D., Verstralen, H. H. F. M., and Jansen, M. G. (1997). A logistic model for timelimit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 169–185). New York: Springer.
(24) Wang, T., and Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339.
(25) Wickelgren, W. (1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 41, 67–85.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001433en_US