Please use this identifier to cite or link to this item:

Title: Person-fit偵測作假之效用- 非參數試題反應理論的模擬與應用
Applying person-fit in faking detection- The simulation and practice of non-parametric item response theory
Authors: 許嘉家
Syu, Jia Jia
Contributors: 余民寧
Yu, Min Ning
Syu, Jia Jia
Keywords: 非參數試題反應理論
Nonparametric item response theory
sample size
Date: 2012
Issue Date: 2013-07-01 14:06:16 (UTC+8)
Abstract: 在心理測驗中,作假的偵測是一個很重要的議題,因為其效果乃影響著變項間的關係、模型測試的正確性、以及測驗的公平性。目前,社會期許量表已被廣泛的應用於作假偵測,但增加題數,則亦增加作答者的負荷。因此,本研究欲探究應用person-fit統計數作為解決方法的可能性。雖然過去已有研究使用參數型的試題反應理論下的person-fit技術進行作假偵測,然而,參數型的試題反應理論的諸多假設,如:大樣本、常態分配、以及多題數等,在實際資料分析中並不容易滿足,因而導致不正確的結果及應用。據此,本研究乃聚焦於探究非參數試題反應理論下的person-fit技術之應用效用,取其使用情境較彈性,且更接近實際的情境之優點。
本研究使用模擬資料及實際資料進行研究假設的檢驗。在研究一中,依據不同的樣本數、樣本能力分配、作假動機以及題目的異常率,以R產生模擬作答並求出person-fit數值,進而比較參數型與非參數型各person-fit指標的偵測率(detection rate),作為效用判斷之依據。研究二則將此技術應用於實際資料中,以社會期許量表與一份興趣量表進行本研究所採用之三種統計數(lz, U3p與Guttman errors)的偵測檢證,以瞭解其在實際情境中的實用性。
研究結果指出,較佳的person-fit統計數需視不同的情境而定。Guttman errors最適合用於當樣本數小於100人,受試者能力值為常態分配及低闊峰,而作答異常率僅為部分的情況。當作答異常率達到100%,受試者能力分配為負偏態及低闊峰,且作假程度嚴重時,以U3p的偵測效果較佳。而lz則最適用於各種中等程度的作假情境。從實際資料的分析結果,指出不論是大樣本或小樣本,能力分配為常態性的假設皆不容易被滿足,且應用person-fit統計數於作假偵測是可行的,特別是使用非參數型的U3p指標。
Faking detection is a crucial issue because of the effect on the hypothesized relation among variables, model testing, and test fairness. Aside from the Social Desirable Scale, which has often been used in detecting faking, we explored the possibility of an alternative method, which is the person-fit statistics of nonparametric item response theory (NIRT). In the scope of parametric item response theory (PIRT), the person-fit technique has been used in faking detection. Although the PIRT assumptions such as large sample size, normal distribution, and number of items are difficult to achieve, numerous researchers still adopt conventional methods, leading to inaccurate results and implications. Using NIRT person-fit may be more flexible and closer to the practical condition based on NIRT features, and are therefore the focus of this study.

We used both simulated and real data to test the hypothesis. In Study 1, the data were simulated and varied in sample size, distribution, faking motivation, and aberrant rate, to investigate the accuracy of person-fit estimating between PIRT and NIRT. In Study 2, the technique using person-fit as a faking detection tool was applied to empirical data to evaluate its use in a practical context.

The results indicate that superior person-fit statistics are conditional. The Guttman error detection rate was higher when the sample size was less than 100, when partial item-faking existed in the scale, and in normal and platykurtic distributions. When the aberrant rate is 100% with severe faking, U3p outperformed other indicators in the negatively skewed and platykurtic distribution. Comparatively, lz could be adopted in all median-faking conditions. Our empirical study found that the normal distribution of ability is not easy to satisfy across a small and large sample size. Adopting person-fit statistics for faking detection is feasible, particularly for U3p.
Reference: References
Armstrong, R. D., Stoumbos, Z. G., Kung, M. T., & Shi, M. (2007). On the performance of the lZ person-fit statistic. Practical Assessment Research & Evaluation, 12(16). Retrieved March 12, 2011, from the World Wide Web:
Boer, P. (2001). Mspwin(Version 5.0). Groningen, Netherlands: iec ProGAMMA.
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113–141.
Chen, C. I., Lee, M. N., &Yen, C. L. (2004). Faking intention on the internet: Effects of test types and situational factors. Chinese Journal of Psychology, 46(4), 349-359.
Chernyshenko, O. S., Stark, S., Chan, K., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523–562.
Chiou, H. J. (2008). Determination of sample size and power analysis in structure equation modeling. Journal of Quantitative Research, 2(1), 139-172.
Cliff, N., & Keats, J. A. (2003). Ordinal measurement in the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
Dagohoy, A. V. T. (2005). Person fit for tests with polytomous responses (Unpublished doctoral dissertation). University of Twente, Enschede, Netherlands.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Drasgow, F. (1989). An evaluation of marginal maximum likelihood estimation for the two-parameter model. Applied Psychological Measurement, 13, 77-90.
Drasgow, F., Levin, M. V., & McLaughlin, M. E. (1991). Appropriateness for some multidimensionsl test batteries. Applied Psychological Measurement, 15, 171-191.
Drasgow, F., Levin, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathemat ). A comparison of model-data fit for parametric and nonparametric item response theory models using ordinal-level ratings (Unpublished doctoral dissertation). Purdue University, West Lafayette, Indiana.
Emons, W. H. M. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224-247.
Emons, W. H. M.(2009).Detection and diagnosis of person misfit from patterns of summed polytomous item scores. Applied Psychological Measurement, 33(8), 599-619.
Emons, W. H. M., Meijer, R. R., & Sijtsma, K. (2002). Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic. Applied Psychological Measurement, 26(1), 88-108.
Emons, W. H. M., Sijtsma, K., & Meijer, R. R. (2005). Global, local, and graphical person-fit analysis using person-response functions. Psychological Methods, 10(1), 101-119.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists.
Mahwah, NJ: Lawrence Erlbaum Associates.
Ferrando, P. J., & Lorenzo, U. (2000). WPerfit: A program for computing parametric person-fit statistics and plotting person response curves. Educational and Psychological Measurement, 60(3), 479-487.
Glas, C. A. W., & Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72(2), 159-180.
Glickman, M. E, Seal, P., & Eisen, S. V. (2009). A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models. Health Services and Outcomes Research Methodology , 9(3), 145-161.
Granberg-Rademacker, J. S. (2010). An algorithm for converting ordinal scale measurement data to interval/ratio scale. Educational and Psychological Measurement, 70(1), 74-90.
Harwell, M. R., & Janosky, J. E. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15, 279-291.
Hemker, B. T. (2000). Reversibility revisited and other comparisons of three types of polytomous IRT models. In A. Boomsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.). Essays on item response theory (pp. 277-296). New York, NY: Springer-Verlag.
Hemker, B. T., Sijtsma, K., & Molenaar, I. W.(1995). Selection of unidimensional scales from a multimensional item bank in the polytomous Mokken’s IRT model. Applied Psychological Measurement, 19(4), 337-352.
Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1996). Polytomous irt models and monotone likelihood ratio of the total score. Psychometrika, 61(4), 679-693.
Higgins, J. (2004). Introduction to Modern Nonparametric Statistics. Pacific Grove, CA: Duxbury Press.
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249-260.
Junker, B. W. , & Sijtsma, K. (2001). Nonparametric item response theory in action: An overview of the special issue. Applied Psychological Measurement, 25, 211-220.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277-298.
LaHuis, D. M., & Copeland, D. (2009). Investigating faking using a multilevel logistic regression approach to measuring person fit. Organizational Research Methods, 12(2), 296-319.
Lai, T. L. (2010). The discrepancy on social desirability and job desirability between different types of jobs applicants and non-applicants (Unpublished doctoral dissertation). National Chengchi University, Taipei, Taiwan.
Li, Y. L., & Baron, J. (2012). Use R: Behavioral research data analysis with R. New York, NY: Springer
Lai, T. L., Yu, M. N., & Hsu, C. W. (2009). The development and validation of employee selection personality inventory. Journal of Educational Research and Development, 5(4), 269-304.
Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multi-choice test scores. Journal of Educational Statistics, 4, 269-290.
Liu, H. C. (2007). Kernel smoothing nonparametric IRT models for polytomous response testing and its application. Journal of Research on Measurement and Statistics, 15, 13-27.
Lord, F. M. (1974). Estimation of latent ability and item parameters when there are
omitted responses. Psychometrika, 39, 247-264.
Marlowe, D. A., & Crowne, D. P. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349-354.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537-563.
Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72-87.
Meijer, R. R., & Baneke, J.(2004). Analyzing psychopathology items: A case for nonparametricitem response theory modeling. Psychological Methods, 9, 354-367.
Meijer, R. R., & van Krimpen-Stoop, E. M. L. A. (2000). Person fit across subgroups: An achievement testing example. In A. Boomsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.). Essays on item response theory (pp. 377-390). New York, NY: Springer-Verlag.
Meijer, R. R., Molenaar, L. W., & Sijtsma, K. (1994). Influence of test and person characteristics on nonparametric appropriateness measurement. Applied Psychological Measurement, 18(2), 111-120.
Meijer, R. R., & Sijtsma, K. (2001). Methodology review-evaluating person fit. Applied Psychological Measurement, 25(2), 107-135.
Mislevy, R. J. (1986). Bayes model estimation in item response models. Psychometrika, 51, 177-195.
Mislevy, R. J., & Bock, R. D. (1984). BILOG Version 2.2: Item analysis and test
scoring with binary logistic models. Mooresville, IN: Scientific Software.
Mokken, R. J. (1971). A theory and procedure of scale analysis: With applications
in political research. The Hague, Nederland: Mouton.
Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden and R. K. Hambleton (Eds.). Handbook of modern item response theory (pp. 351-367). New York: Springer-Verlag.
Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement 6, 417–430.
Molenaar, I. W. (1991). A weighted Loevinger H-Coefficient extending Mokken Scaling to multicategory items. Kwantitatieve Methoden, 12(37), 97-117.
Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological Measurement, 25, 295-299.
Molenaar, I. W., & Sijtsma, K. (2000). User’s manual MSP5 for windows. Groningen: iec ProGAMMA.
Nozawa, Y. (2008). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design (Unpublished doctoral dissertation). University of Iowa, Iowa..
Osterlind, S. J., & Everson, H. T. (2008). Differential item function(2nd ed). Thousand Oaks, CA: Sage.
R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
Ramsay, J. O. (2000). TestGraf. A program for the graphical analysis of multiple-choice tests and questionnaire data [Computer software and manual]. Retrieved January 22, 2011, from the World Wide Web:
Razali, N. M., & Wah, Y. B.(2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
Reise, S. P. (2000). Using multilevel logistic regression to evaluate person-fit in IRT models. Multivariate Behavioral Research, 35, 543-568.
Reise, S. P., & Waller, N. G. (1990). Fitting the two parameter model to personality data. Applied Psychological Measirement, 14, 45-58.
Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods, 8, 164–184.
Reise, S. P., & Henson, J. M. (2003). A discussion of modern versus traditional
psychometrics as applied to personality assessment scales. Journal of Personality Assessment, 81, 93–103.
Reise, S. P., & Widaman, K. F. (1999). Assessing the fit of measurement models at the individual level: A comparison of item response theory and covariance structure approaches. Psychological Methods, 4, 3-21.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measuement, 27, 133-144.
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17 (5), 1-25.
Robie, C., Zickar, M. J., & Schmitt, M. J. (2001). Measurement equivalence between
applicant and incumbent groups: An IRT analysis of personality scales. Human
Performance, 14, 187-207.
Ronald, L. T. (1997). A Monte Carlo investigation of parameter estimation efficacy using modified fixed "C" three parameter log (3PL) item response theory models with small sample sizes. ETD Collection for Wayne State University. Retrieved March 12, 2011, from the World Wide Web:
Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., &Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41-53.
Seong, T. J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299-311.
Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22(1), 3-31.
Sijtsma, K. (2005). Nonparametric item response theory models. Encyclopedia of Social Measurement, 2, 875-882.
Sijtsma, K., Emons, W. H. M., Bouwmeester, S., Nyklicek, I., & Roorda, L. D. (2008). Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref). Quality of Life Research 17(2). 275-290.
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). London: Sage Publications.
Sijtsma, K., &Molenaar, I. W. (1987). Reliability of test scores in nonparametric item response theory. Psychometrika, 52(1), 79-97.
Sijtsma K, & van der Ark L. A. (2001). Progress in NIRT analysis of polytomous item scores: Dilemmas and practical solutions. In A. Boomsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.). Essays on item response theory (pp. 297-318). New York: Springer-Verlag.
Snijders, T. A. B.(2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331-342.
Sodano, S. M., & Tracey, T. J. G. (2011). A brief inventory of interpersonal problems-circumplex using nonparametric item response theory: Introducing the IIP-IRT. Journal of personality assessment, 93(1), 62-75.
Sprinthall, R. C. (1997). Basic statistic analysis (5th ed). Boston : Allyn and Bacon
Stochl, J. (2007). Nonparametric extension of item response theory models and its usefulness for assessment of dimensionality of motor tests, Acta Universitatis Carolinae, 42(1), 75-94.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 15, 1-16.
Stewart, M. E., Watson, R., Clark, A., Ebmeier, K. P., & Deary, I. J. (2010). A hierarchy of happiness? Mokken scaling analysis of the Oxford Happiness Inventory. Personality and Individual Differences, 48 (7), 845-848.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293–325.
St-Onge, C., Valois, P., Abdous, B., & Germain, S. ( 2009). A Monte Carlo study of the effect of item characteristic curve estimation on the accuracy of three person-fit statistics. Applied Psychological Measurement, 33(4), 307-324.
St-Onge, C., Valois, P., Abdous, B., & Germain, S. ( 2011). Accuracy of person-fit statistics: A Monte Carlo study of the influence of aberrance rates. Applied Psychological Measurement, 35(6), 419-432.
Swaminathan, H., & Gifford, J. A. (1983). Estimation of parameters in the three-parameter latent trait model. In D. Weiss (Ed.), New horizons in testing (pp. 13-30). New York: Acasemic Press.
Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 181–211). Mahwah, NJ: Lawrence Erlbaum Associates
van den Wittenboer, G., Hox, J. J., & De Leeuw, E. D. (2000). Latent class analysis of respondent scalability. Quality & Quantity, 34, 177-191.
van der Ark, L. A. (2007). Mokken scale analysis in R. Journal of Statistical Software, 20(11), 1-19.
van der Flier, H. (1980). Vergelijkbaarheid van individuele testprestaties [Comparability of individual test performance]. Lisse, Netherlands: Swets & Zeitlinger.
van Krimpen-Stoop, E. M. L. A., & Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164-180.
van Schuur, W. H. (2003). Mokken scale analysis: Between the Guttman scale and parametric item response theory. Political Analysis, 11, 139-163.
Waller, N. G., Thompson, J. S., & Wenk, E. (2000). Using IRT to separate measurement bias from true group differences on homogeneous and heterogeneous scales: An illustration with the MMPI. Psychological Methods, 5, 125-146.
Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21, 22-40.
Woods, C. M. (2006). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods, 11(3), 253-270.
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.
Xu, X. L. (2004). Computerized adaptive testing and equating methods with nonparametric IRT models (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, IL, Champaign.
Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometriks, 52, 275-291.
Yu, M. N. (2002). Educational test and assessment. Taipei city, Taiwan: psychological Publishing company, Ltd.
Yu, M. N. (2009). The item response theory and its application. Taipei city, Taiwan: psychological Publishing company, Ltd.
Yu, M. N., Shie, M. J., Chen, P. L., Huang, S. Y., Chung, P. C., Chao, P. C., Chen, Y. H., Syu, J. J. (2010). The construction of apitide test for secondary education under the free entry exam policy (NAER-99-23-B-2-01-00-2-01). New Taipei city, Taiwan: National Academy of Educational Research.
Zicker, M. J., & Drasgow, F. (1996). Detecting faking on a personality instrument using appropriate measurement. Applied Psychological Measurement, 20, 71-87.
Zickar, M. J., Gibby, R. E., & Robie, C. (2004). Uncovering faking samples in applicant, incumbent, and experimental data sets: An application of mixed model item response theory. Organizational Research Methods, 7(2), 168-190.
Description: 博士
Source URI:
Data Type: thesis
Appears in Collections:[教育學系] 學位論文

Files in This Item:

File SizeFormat
251501.pdf11011KbAdobe PDF1412View/Open

All items in 學術集成 are protected by copyright, with all rights reserved.

社群 sharing