Publications-Theses

題名 以最大測驗訊息量決定通過分數之研究
Study of the Standard Setting by the Maximum Test Information
作者 謝進昌
Shieh, Jin-Chang
貢獻者 余民寧
Yu, Min-Ning
謝進昌
Shieh, Jin-Chang
關鍵詞 最大測驗訊息量法
換算古典測驗分數法
測驗特徵曲線構圖法
定錨點
精熟標準設定
精熟測驗
maximum test information approach
transformed classical test scores approach
test characteristics curve mapping method
anchor points
standard setting
mastery test
日期 2004
上傳時間 17-Sep-2009 15:10:37 (UTC+8)
摘要 本研究目的,乃在運用試題反應理論中最大測驗訊息量的概念於精熟標準設定上作為探討的主軸,透過其歷史的演進與發展,衍生出詮釋本研究最大測驗訊息量法的三個面向,分別為:元素的搭配組合與調整、廣義測驗建構流程、多元效度等,並以此概念賦予解釋運用最大測驗訊息量於精熟標準設定時的合理性與適切性。同時,確立最大測驗訊息量法於公式意涵、試題選擇與統計考驗力面向的合理性,建立其於精熟標準上的理論基礎,而後,再輔以精熟/未精熟者分類一致性信度值以期提供多元效度證據。最後,探討測驗分數的轉換方法、差異能力描述,期能同時獲得量與質的測驗結果解釋。

綜整分析,可發現以下幾點結論:
一、運用最大測驗訊息量法於精熟標準設定時,在分類的信度指標上,顯示由此求得精熟標準,經交叉驗證後,大致可獲得滿意的結果,皆有高達九成以上的精確分類水準,且藉由區間的概念亦能充分顯現出,以最大測驗訊息量法求得之標準,可作為專家設定精熟標準時參考、判斷的優勢。而在分數轉換上,不論搭配換算古典測驗分數法或測驗特徵曲線構圖法時,其分類精熟/未精熟者的一致性表現,大致可獲得滿意的結果,乃是值得參照的組合策略。

二、在運用定錨點以解釋由最大測驗訊息量法於國中基本學力測驗求得之精熟標準時,可發現未精熟者乃僅需具備學科基礎知識與簡易圖示理解能力,而對於精熟者而言,則需進一步擁有對於廣泛學科知識的了解;複雜問題、資料與圖表詮釋;邏輯推理、分析實驗結果以獲得相關論點等能力,或者更高階之具備進階學科知識;綜合、評鑑資料、情境傳遞之訊息的能力。

三、探討測驗長度因素時,分析結果顯示不論採行最大測驗訊息量法、換算古典測驗分數法或是測驗特徵曲線構圖法,皆受此因素的影響,顯示測驗長度愈長,分類一致性愈高,此項結果乃與過去大多數的研究一致。另,由本資料分析結果乃建議測驗長度20題時,會是必備的基本題數要求值。此外,若從細部精確錯誤分類人數角度分析時,於實務用途上,可發現對於影響轉換分數時,產生差異分數的因素,決策者並不容易掌握與控制,但卻可藉由增加測驗長度,分散分數點的人數,以彌平錯誤分類的影響。

四、探討測驗異質性因素時,最大測驗訊息量法因具有因試題參數而調整估計受試者能力的特性,使得在異質測驗時,分類一致性仍能維持在不錯的水準之上。反觀換算古典測驗分數法與測驗特徵曲線構圖法,在固定精熟標準下,則有明顯的錯誤分類比率,此現象也反應出現行以固定60分作為及格(精熟)標準的缺失。

五、探討採用簡易測驗、困難測驗或常態測驗間於轉換分數上之效果時,由換算古典測驗分數法或測驗特徵曲線構圖法轉換來自最大測驗訊息量法之精熟標準時,資料分析結果顯示,不論於何種測驗難度類型中,採用何種轉換方式,並不會嚴重影響轉換分數間一致性分類的效果。另,若從細部精確錯誤分類人數角度分析時,本研究所採之最大測驗訊息量法,因具備隨測驗難易程度來決定門檻的特性,於簡易測驗中求得之精熟標準較低,而於困難測驗中求得之精熟標準相對較高,使得於轉換分數上,即使有較大的差異分數,亦不會造成嚴重的錯誤分類人數。

六、在探討測驗長度、測驗異質性因素與定錨點題目篩選間互動關係時,分析結果顯示,測驗長度與測驗異質性,並非是絕對影響定錨點題目篩選的因素,更重要的在於最大試題訊息量所對應之最適能力值是否能與定錨點相搭配。

綜整之,本研究所採最大測驗訊息量法,經檢驗後,於分類一致性上乃具有不錯的表現,且搭配相對強韌、嚴謹的理論支持與適切測驗結果解釋方法等,是最適合用於大型考試上使用。因此,乃建議未來政府單位或實務工作者於進行大型證照、資格檢定考試時,可考慮使用本策略。
The purpose of this study is to adopt the concepts of IRT maximum test information to standard setting. At first, we are trying to discover three facets of interpretation in using the maximum test information to standard setting through the historical movement of standard setting. The three facets are component combination and adjustment, generalized test construction processes and multiple validities. Depending on these three concepts, we can easily explain the reasonableness and appropriateness of maximum test information approach. After that, we further investigate the reasonableness from the dimensions of definition of formula, item selections and statistical power to establish the basic theory of the maximum information approach in standard setting. In addition, we also examine the effects on exact classification of master/non-master in expectation to provide multiple evidences for validity. Finally, the method of classical test scores transformation and difference ability description are discussed to provide quantitative and qualitative test result interpretation simultaneously.

In sum, some conclusions are proposed.
1.In applying the maximum test information approach to standard setting, the effect on exact classification of master/ non-master may come to a satisfying result. We may have at least 90% exact classification performance. At the same time, we also find that the mastery standard deriving from the maximum test information approach may have some advantages being a starting reference point for experts to adjust on the basis of the view of confidence interval. In the aspect of classical test scores transformation, no matter what approach you take, the transformed classical test scores approach or test characteristic curve mapping method, the consistency of exact classification of master/ non-master may hold. We may suggest the combination strategy is really worthy to take into consideration in standard setting.

2.In applying the anchor point to interpret Basic Competency Test result, we may find non-master only has basic academic knowledge and simple graph understanding ability, but for the master, he may need extensive academic knowledge; ability of complicated problems、data and graph interpretation; logic reasoning、analyzing experimental result to get related issues. Moreover, advanced academic knowledge; ability of synthesizing and evaluating information from data and surroundings are also included.

3.In the aspect of test length, the result of this research shows no matter what approach you take, maximum test information approach、transformed classical test scores approach or test characteristic curve mapping method, they are all influenced. It shows the longer test length, the higher consistency of exact classification of master/non-master. This result is consistent to most of the studies in the past. On the other hand, we suggest the 20 items is a fundamental value. Moreover, from the view of exact number of error classification, we can find that the real factor affecting the difference scores in transforming classical test score is unable to control in practical usage, but we can just disperse the numbers of people in each test score point to reduce the influence of error classification by increasing test length.

4.In the aspect of diverse test difficulty, because the maximum test information approach possesses the characteristic of examinees’ ability adjustment depending on item parameters, it is less influenced to maintain a acceptable level of consistent classification. In contrast with the maximum test information approach, the transformed classical test scores approach and test characteristic curve mapping method may have obvious high ratio of error classification under the fixed mastery standard. This also reflects the deficiency of current fixed 60 points passing scores.

5.In the aspect of analyzing the effect of score transformation between easy、hard and normal test, this research shows no matter what approach you take in any type of test difficulty, they may not severely influenced. Furthermore, from the view of exact number of error classification, because the maximum test information approach possesses the characteristic of deciding passing level depending on the degree of test difficulty (the lower mastery standard in easy test and the higher in hard test), it may not lead to a severe error classification even if there exists a large difference score in classical test score transformation.

6.In the aspect of interaction between test length、diverse test difficulty and anchor items selection, this research shows that test length and diverse test difficulty are not the real factors affecting anchor items selection. The more accurate cause is if the mastery standard deriving from the maximum test information approach may coordinate with the anchor point or not.

In sum, the maximum test information approach may not only lead to a satisfying exact classification performance after analysis, but also be supported by strong and strict theory and accompany proper test result interpretation method. It is the most proper method in standard setting for large-sized test. Finally, we suggest the government or practitioners may consider adopting this strategy for future usage.
參考文獻 行政院教育改革委員會(1996)。教育改革總諮議報告書(第三章綜合建議)。2004年12月5日,取自http://www.edu.tw/eduinf/change/5/CH-3.html#c3。
考選部(2004)。考選部全球資訊網。2004年5月31日,取自http://inter1.moex. gov.tw/statute/statute1.asp?kind=31。
余民寧(1992)。試題反應理論的介紹(7):訊息函數。研習資訊,9(6),5-9。
余民寧(2002)。教育測驗與評量:成就測驗與教學評量(第二版)。臺北市:心理出版社。
余民寧、汪慧瑜(2005)。量尺分數的另類表示方法:以國中基本學力測驗為例。教育與心理研究,28期,審稿中。
林惠芬(1993)。通過分數設定方法在護理人員檢覈筆試測驗之研究。測驗年刊,40,253-262。
吳裕益(1986)。標準參照測驗通過分數設定方法之研究。國立政治大學教育研究所博士論文(未出版)。
吳裕益(1988)。標準參照測驗通過分數設定方法之研究,測驗年刋,35,159-166。
涂柏原,陳柏熹,章舜雯,林世華(2000)。基本學力分數的建立。國中基本學力測驗推動工作委員會,2004年12月6日,取自http://www.bctest.ntnu.edu.tw/score1.htm。
莊淑如(1997)。證照制度的落實—以德國經驗為借鏡。技職雙月刋,37,54-56。
教育部(1998)。國民中學學生基本學力指標。台北﹕教育部。
教育部國教司(2004)。國民中小學九年一貫課程暫行綱要,2004年12月5日,取自http://140.122.120.230/9CC/temporary/temporary-all.htm。
國民中學學生基本學力測驗推動工作委員會(2002a)。九十、九十一年國中基本學力測驗試題取材範圍比較。飛揚第十三期,2004年12月5日,取自http://www.bctest.ntnu.edu.tw/。
國民中學學生基本學力測驗推動工作委員會(2002b)。國中基本學力測驗自然科試題之設計理念。飛揚第十三期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。
國民中學學生基本學力測驗推動工作委員會(2002c)。九十一年第二次國民中學學生基本學力測驗試題特色。飛揚第十六期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。
國民中學學生基本學力測驗推動工作委員會(2003)。測驗分數的解釋(下)。飛揚第二十三期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。
鄭明長、余民寧(1994)。各種通過分數設定方法之比較。測驗年刊,41,19-40。
鄭清泉(2001)。人工化與電腦化適性精熟能力判定在國小學童數學精熟分類一致性之比較研究。國立嘉義大學國民教育研究所碩士論文。
謝進昌、余民寧(2005)。國中基本學力測驗之DIF的實徵分析:以91年度二次測驗為例,國立新竹師範學院學報,審稿中。
Andrew, B. J., & Hecht, J. T. (1976). A preliminary investigation of two procedures for setting examination standards. Educational and Psychological Measurement, 36, 35-50.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (pp.508-600). Washington, D.C.: American Council on Education.
Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.19-32). Hillsdale, NJ: Lawrence Erlbaum.
Beaton, A. E., & Allen, N. L. (1992). Interpretation scales through scale anchoring , Journal of Educational Statistics, 17, 191-201.
Behuniak, P., Archambault, F. X., & Gable, R. K. (1982). Angoff and Nedelsky standard setting procedures: Implication for the validity of proficiency test score interpretation, Educational and Psychological Measurement, 42, 247-255.
Berk, R.A. (1976). Determination of optimal cutting scores in criterion-referenced measurement. Journal of Experimental Education, 45, 4-9.
Berk, R. A. (1980). A consumers` guide to criterion-referenced test reliability. reliability. Journal of Educational Measurement, 17(4), 323-349.
Berk, R. A. (1984). A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press.
Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion- referenced tests. Review of Educational Measurement, 56(1), 137-172.
Berk, R. A. (1996). Standard setting: The next generation (where few psychometricians have gone before!). Applied Measurement in Education, 9(3), 215-235.
Bernknopf, S., Curry, A., & Bashaw, W. L.(1979). A defensible model for determining a minimal cutoff score for criterion referenced tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick, Statistical theories of mental test scores (chapters 17-20). Reading, MA: Addison-Wesley.
Bontempo, B. D., Marks, C. M., & Karabatsos, G. (1998). A meta-analytic assessment of empirical differences in standard setting procedures. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.
Brennan, R. L. & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory, Applied Psychological Measurement, 4, 219-240.
Brandon, P. R. (2002). Two versions of the contrasting-groups standard-setting method: A review. Measurement and Evaluation in Counseling and Development, 35(3), 167-181.
Brandon, P. R. (2004). Conclusion about frequently studied modified Angoff standard setting topics. Applied Measurement in Education, 17(1), 59-88.
Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the national teacher examinations. Journal of Educational Measurement, 27(2), 145-163.
Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2002). A comparison of Angoff and Bookmark standard setting method. Journal of Educational Measurement, 39(3), 253-263.
Cascio, W. F., Alexander, R.A., & Barrett, G. V. (1988). Setting cutoff scores: Legal, psychometric, and professional issues and guidelines. Personnel Psychology, 41, 1-24.
Chang, L., Dziuban, C. D., & Hynes, M. C.(1996). Does a standard reflect minimal competency of examinees or judge competency? Applied Measurement in Education, 9(2), 161-173.
Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education, 12(2), 151-165.
Chang, L., van der Linder, W. J., & Vos, H. J.(2004). Setting standards and detecting intrajudge inconsistency using interdependent evaluation of response alternatives. Educational and Psychological Measurement, 64(5), 781-801.
Chinn, R. N., & Hertz, N. R. (2002). Alternative approaches to standard setting for licensing and certification examinations. Applied Measurement in Education, 15(1), 1-14.
Cizek, G. J. (1993). Reconsidering standard and criteria. Journal of Educational Measurement, 30(2), 93-106.
Cizek, G. J. (1996). Standard-setting guidelines. Educational Measurement: Issues and Practice, 15(1), 13-21.
Clauser, B. E., Swanson, D. B., & Harik, P.(2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure. Journal of Educational Measurement, 39(4), 269-290.
Cohen, J. A. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.
Conover, J. W., & Iman, R. L. (1978). The rank transformation as a method of discrimination with some examples, Albquerque, NM: Sandia Laboratories.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. NY: CBS College Publishing.
Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teachers Examinations, Journal of Educational Measurement, 21, 113-129.
Cross, L. H., Frary, R. B., Kelly, P. P., Small, R. C., & Impara, J. C. (1985). Establishing minimum standards for essays: Blind versus informed review. Journal of Educational Measurement, 22, 137-146.
de Gruijter, D. N. M., & Hambleton, R. K. (1984). On problems encountered using decision theory to set cutoff scores. Applied Psychological Measurement, 8, 1-8.
Dillon, G. F. (1996). The expectations of standard setting judges. CLEAR Exam Review, 2, 22-26.
Ebel, R. L. (1972). Essentials of educational measurement (2rd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Ebel, R. L. (1979). Essentials of educational measurement (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Educational Testing Service (1976). Report on a study of the use of the National Teachers’ Examination by the state of South Carolina. Princeton, NJ: Author.
Eignor, D. R., & Hambleton, R. K. (1979). Effects of test length and advancement score on several criterion referenced test reliability and validity indices. (Laboratory of Psychometric and Evaluative Research Report No. 86). Amherst, MA: University of Massachusetts, School of Education.
Emrick, J. A. (1971). An evaluation model for mastery testing. Journal of Educational Measurement, 8, 321-326.
Fehrman, M. L., Woehr, D. J., & Arthur, W. (1991). The Angoff cutoff score method: The impact of frame-of-reference rater training. Educational and Psychological Measurement, 51(4), 857-872.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems, Annals of Edgenics, 7, 179-188.
Fitzpatrick, A. R. (1989). Social influences in standard setting: The effects of social interaction on group judgments. Review of Educational Research, 59(3), 315-328.
Frick, T. W. (1992). Computerized adaptive mastery tests as expert systems. Journal of Educational Computing Research, 8(2), 187-213.
Garrett, H. E. (1937). Statistics in psychology and education. New York: Longmans, Green.
Gessaman, M. P., & Gessaman, P. H. (1972). A comparison of some multivariate discrimination procedures. Journal of the American Statistical Association, 67, 468-472.
Giraud, G., Impara, J. C., & Buckendahl, C. (2000). Making the cut in school districts: Alternative methods for setting cut-scores. Educational Assessment, 6, 291-304.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519-521.
Glaser, R., & Nitko, A. J. (1971). Measurement in learning and instruction . In R. L. Thorndike (Eds.). Education Measurement (2nd ed., pp625-670). Washington, DC: American Council on Education.
Gray, W. M. (1978). A comparison of Piagetian theory and criterion-referenced measurement, Review of Educational Research, 48, 223-249.
Green, D. R., Trimble, C. S., & Lewis, D. M. (2003). Interpreting the results of three different standard setting procedures. Educational Measurement: Issues and Practice, 22(1), 22-32.
Guilford, J. P. (1942). Fundamental statistics in psychology and education. New York: McGraw-Hill.
Haertel, E. (1985). Construct validity and criterion-referenced testing, Review of Educational Research, 55(1), 23-46.
Haladyna, T. M., & Roid, G. H. (1983). A comparison of two approaches to criterion-referenced test construction. Journal of Educational Measurement, 20(3), 271-281.
Halpin, G., Sigmon, G., & Halpin, G.(1983). Minimum competency standards set by three divergent groups of raters using three judgemental procedures: Implication for validity, Educational and Psychological Measurement , 43, 185-196.
Hambleton, R. K. (1978). On the use of cut-off scores with criterion-referenced tests in instructional settings. Journal of Educational Measurement, 15(4), 277-290.
Hambleton, R. K. (1980). Test score validity and standard setting methods. In R. A. Berk(Ed.), Criterion-referenced measurement: The State of Art. Baltimore, Md.: John Hopkins University.
Hambleton, R. K. (1983). Application of item response models to criterion referenced assessment. Applied Psychological Measurement, 7(1), 33-44.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn(Eds.), Educational measurement (3rd ed.)(pp.147-200). New York: Macmillan.
Hambleton, R. K. (1990). Criterion referenced testing methods and practices. In T. B. Gutkin, & C. R. Reynolds (2nd ed.), The Handbook of School Psychology (pp. 388-415). New York: John Wiley & Sons.
Hambleton, R. K. (1998). Enhancing the validity of NAEP achievement level score reporting. Proceedings of achievement levels workshop (pp. 77-98). Washington, DC: National Assessment Governing Board.
Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 89-116). Mahwah, NJ: Erlbaum.
Hambleton, R. K., & de Gruijter, D. N. M. (1983). Application of item response models to criterion referenced test item selection. Journal of Educational Measurement, 20(4), 355-367.
Hambleton, R. N., & Eignor, D. R.(1980). Competency test development, validation, and standard setting. In R. M. Jaeger, & C. X. Tittle (Eds.), Minimum competency achievement testing: Motive, models, measures, and consequence (pp.367 -396). Berkeley, CA: McCutchan.
Hambleton, R. K., Jaeger, R. M., Plake, B. S., & Mills, C. N.(in press). Handbook for setting performance standards. Washington, DC: Council of Chief State School Officers.
Hambleton, R. K., Mills, C. N., & Simon, R. (1983). Determining the lengths for criterion referenced tests. Journal of Educational Measurement, 20(1), 27-38.
Hambleton, R. K., & Novick, M. R. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 10, 159-170.
Hambleton, R. K., & Plake, B. S. (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in Education, 8(1), 41-55.
Hambleton, R. N., Swaminathan, H., Algina, J., & Coulson, D. B. (1978). Criterion- referenced testing and measurement: A review of technical issues and developments. Review of Educational Research, 48, 1-47.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and application. Boston: Kluwer Nijhoff Publishing.
Hambleton, R. K., & Traub, R. E.(1973). Analysis of empirical data using two logistic latent trait models, British journal of mathematical and statistical psychology, 26, 195-211.
Hambleton, R. K., & Zaal, J. N. (1991). Advances in educational and psychological testing (Eds.). Boston, MA: Kluwer.
Harasym, P. H. (1981). A comparison of the Nedelsky and modified Angoff standard-setting procedure on evaluation outcome. Educational and Psychological Measurement, 41(3), 725-734.
Harwell, M. R. (1983). A comparison of two item selection procedures in criterion referenced measurement. Unpublished doctoral dissertation, University of Wisconsin-Madison.
Hattie, J. A. (1985). Methodological review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.
Hoge, R. D., & Coladarci, T.(1989). Teacher based judgments of academic achievement : A review of the literature, Review of educational research, 59(3), 297-313.
Hudson, J. P. Jr., & Campion, J. E. (1994). Hindsight bias in an application of the Angoff method for setting cutoff scores. Journal of Applied Psychology, 79(6), 860-865.
Hurtz, G. M., & Hurtz, N. R.(1999). How many raters should be used for establishing cutoff scores with the Angoff method? A generalizability theory study. Educational and Psychological Measurement, 59(6), 885-897.
Hurtz, M. G., & Auerbach, M. A. (2003). A meta analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educational and Psychological Measurement, 63(4), 584-601.
Huynh, H. (1998). On score locations of binary and partial credit items and their applications to item mapping and criterion-referenced interpretation. Journal of Educational and Behavioral Statistics, 23(1), 35-56.
Huyhn, H. (2000). On item mappings and statistical rules for selecting binary items for criterion referenced interpretation and bookmark standard settings. Paper presented at the annual meeting of the National Council on Measurement in Education.(New Orleans, LA, April), 25-27.
Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353-366.
Ivens, S. H. (1970). An investigation of item analysis, reliability and validity in relation to criterion-referenced tests. Unpublished doctoral dissertation, Florida State University.
Jaeger, R. M. (1978). A proposal for setting a standard on the North Carolina High School Competency Test. Paper presented at the annual meeting of the North Carolina Association for Research in Education, Chapel Hill.
Jaeger, R. M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461-476.
Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn(Eds.), Educational Measurement (3rd ed., pp. 485-514). New York: Macmillan.
Jaeger, R. M. (1995). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8(1), 15-40.
Jaeger, R. M., & Mills, C. N. (1997, April). A holistic procedure for setting performance standards on complex large-scale assessments. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Jaeger, R. M., & Mills, C. N. (2001). An integrated judgment procedure for setting standards on complex, large-scale assessments. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 313-338). Mahwah, NJ: Erlbaum.
Jaeger, R. M., & Tittle, C. X. (1980). Minimum competency achievement testing. Berkeley, CA: McCutchan.
Kahl, S. R., Crockett, T. J., DePascale, C. A., & Rindfleisch, S. L. (1994). Using actual student work to determine cut-scores for proficiency levels: New methods for new tests. Paper presented at the National Conference on Large-Scale Assessment, Albuquerque, NM.
Kahl, S. R., Crockett, T. J., DePascale, C. A., & Rindfleisch, S. L. (1995). Setting standards for performance levels using the student-based constructed response method. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Kane, M. T. (1987). On the use of IRT models with judgmental standard setting procedures. Journal of Educational Measurement, 24(4), 333-345.
Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425-461.
Kane, M. (1998). Choosing between examinee-centered and test-centered standard setting methods. Educational Assessment, 5(3), 129-145.
Kane, M. (2001). So much remains the same: Conception and status of validation in setting standards. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Erlbaum.
Kingsbury, G. G., & Weiss, D. J.(1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.257-283).New York: Academic Press.
Koffler , S. L. (1980). A comparison of approaches for setting proficiency standards. Journal of Educational Measurement, 17, 167-178.
Koretz D., & Deibert E. (1995). Setting standards and interpreting achievement: A cautionary tale from the National Assessment of Educational Progress. Educational Assessment, 3(1), 53-81.
Kriewall, T. E. (1972). Aspects and applications of criterion-referenced tests (IER Tech. Paper No. 103). Downers Grove, IL: Institute for Educational Research.
Lewis, D.M., Green, D. R., Mitzel, H. C., Baum. K., & Patz, R. J. (1998, April). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Lewis, D. M., Mitzel, H.C., & Green, D. R. (1996). Standard setting: A bookmark approach. Paper presented at the Council of Chief State School Officers National Conference on Large Scale Assessment, Boulder, CO.
Livingston, S. A., & Zieky, M. J.(1978). Basic skills assessment program : manual for setting standards on the basic skills assessment tests. Menlo Park, Calif. : Addison-Wesley Testing Service.
Livingston, S. A., & Zieky, M. J.(1989). A comparison study of standard-setting methods. Applied Measurement in Education, 2(2), 121-141.
Loomis, S. C., Bay, L., Yang, W., & Hanick, P. L. (1999). Field trials to determine which rating method(s) to use in the 1998 NAEP Achievement Levels-Setting Process for Civics and Writing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal.
Loomis, S.C.,& Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 175-217). Mahwah, NJ: Erlbaum.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Maurer, T. J., Alexander, R. A., Callahan, C. M., Bailey, J J., & Dambrot, F. H. (1991). Methodological and psychometric issues in setting cutoff scores using the Angoff method, Personnel psychology, 44, 235-262.
McGinty, D., & Neel, J. H. (1996). Judgmental standard setting using a cognitive components model. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York.
Melican, G.. J., Mills, C. N., & Plake, B. S. (1989). Accuracy of item performance predictions based on the Nedelsky standard setting method. Educational and Psychological Measurement, 49, 467-478.
Meskauskas, J. A. (1976). Evaluation models for criterion-referenced testing: Views regarding mastery and standard setting. Review of Educational Research, 46, 133-158.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 33-45). Hillsdale, NJ: Lawrence Erlbaum.
Messick, S. (1989). Validity. In R. L. Linn(Ed.), Educational Measurement, (pp. 13-104). New York: Macmillan.
Millman, J. (1972). Tables for determining number of items needed on domain- referenced tests and number of students to be tested. ( Technical Paper No. 5). Los Angeles: Instructional Objective Exchange.
Millman, J. (1973). Passing scores and test lengths for domain-referenced measures. Review of Educational Research, 43, 205-216.
Mills, C. N. (1983). A comparison of three methods of establishing cut-off scores on criterion-referenced tests. Journal of Educational Measurement, 20(3), 283-292.
Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The bookmark method: Psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 249-281). Mahwah, NJ: Erlbaum.
Nassif, P. M. (1978). Standard setting for criterion referenced teacher licensing tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Toronto.
Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3-19.
Nitko, A. J. (1980). Distinguishing the many varieties of criterion-referenced tests, Review of Educational Measurement, 50, 461-485.
Nitko, A. (1983). Educational tests and measurement: An introduction. New
York: Harcourt Bruce Jovanovich.
Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24, 56-64.
Norcini, J., Shea, J. A., & Grosso, L. (1991). The effect of numbers of experts and common items on cutting score equivalents based on expert judgment. Applied Psychological Measurement, 15(3), 241-246.
Norcini, J. J., Shea, J. A., & Kanya, D. T. (1988). The effect of various factors on standard setting. Journal of Educational Measurement, 25, 57-65.
Novick, M. R., & Lewis, C. (1974). Prescribing test length for criterion-referenced measurement. In C. W. Harris, M. C. Alkin, & W. J. Popham (Eds.), Problems in criterion-referenced measurement (CSE Monograph Series in Evaluation, No. 3, pp. 139-158). Los Angeles: Center for the Study of Evaluation, University of California.
Novick, M. R., Lewis, C., & Jackson, P. H. (1973). The estimation of proportions in a groups. Psychometrika, 38, 19-45.
Pitoniak, M. J. (2003). Standard setting methods for complex licensure examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Plake, B. S., Hambleton, R. K., & Jaeger, R. M. (1997). A new standard-setting method for performance assessments: The dominant profile judgment method and some field-test results. Educational and Psychological Measurement, 57(3), 400-411.
Plake, B. S., & Hambleton, R. K. (2001). The analytic judgment method for setting standards on complex performance assessments. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 283-312). Mahwah, NJ: Erlbaum.
Popham, W. J., & Husek, T. R. (1969). Implication of criterion-referenced measurement, Journal of Educational Measurement, 6, 1-9.
Plake, B. S., Impara, J. C., & Potenza, M. T.. (1994). Content specificity of expert judgments in a standard-setting study. Journal of Educational Measurement, 31(4), 339-347.
Plake, B. S., & Melican, G. J. (1989). Effects of item context on intrajudge consistency of expert judgments via the Nedelsky standard setting method. Educational and Psychological Measurement, 49(1), 45-51.
Popham, W. J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.
Putnam, S. E., Pence, P., & Jaeger, R. M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments. Applied Measurement in Education, 8(1), 57-83.
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 237-255). New York: Academic Press.
Reckase, M. D. (1998). Converting boundaries between National Assessment Governing Board performance categories to points on the National Assessment of Educational Progress score scale: The 1996 science NAEP process. Applied Measurement in Education, 11, 9-21.
Reilly, R. R., Zink, D. L., & Israelski, E. W. (1984). Comparison of direct and indirect methods for setting minimum passing scores. Applied Psychological Measurement, 8, 421-429.
Roudabush, G. E. (1974). Models for a beginning theory of criterion-referenced tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modification. Applied Psychological Measurement, 18(3), 229-244.
Saunders, J. C, & Mappus, L. L. (1984). Accuracy and consistency of expert judges in setting passing scores on criterion-referenced tests: The South Carolina experience. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
Schoon, C. G., Gullion, C. M., & Ferrara, P. (1979). Bayesian statistics, credentialing examinations, and the determination of passing points. Evaluation and the Health Professions, 2, 181-201.
Shepard, L. A. (1983). Standards for placement and certification. In S. B. Anderson & J. S. Helmick (Eds.), On educational testing (pp. 61-90). San Francisco: Jossey-Bass.
Sireci, S. G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12(3), 301-325.
Skakun, E. N., & Kling, S. (1980). Comparability of methods for setting standards. Journal of Educational Measurement, 17, 229-235.
Smith, R. L., & Smith, J. K. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures. Journal of Educational Measurement, 25(4), 259-274.
Spray, J. A., & Reckase, M. D. (1994). The selection of test items for decision making with a computer adaptive test. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414.
Stephenson, A. S., Elmore, P. B., & Evans, J. A. (2000). Standard-setting techniques: An application for counseling programs. Measurement and Evaluation in Counseling and Development, 32(4), 229-244.
Subkoviak, M. J. (1988). A practitioner’s guide to computation and interpretation of reliability indices for mastery test. Journal of Educational Measurement, 25, 47-55.
Swaminathan, H., Hambleton, R. K., & Algina, J. (1974). Reliability of criterion-referenced tests: A decision theoretic formulation. Journal of Educational Measurement, 11, 262-267.
Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. The seventeen yearbook of the National Society for the study of Education, Part II. Bloomington, III.: Public School, Publishing Company.
van der Linden, W. J. (1981) A latent trait look at pretest-posttest validation of criterion referenced test items, Review of Educational Research, 51(3), 379-402.
van der Linden, W. J. (1982). A latent trait method for determining intra-judge inconsistency in the Angoff and Nedelsky techniques of standard setting , Journal of Educational Measurement, 19, 25-308.
van der Linder, W. J. (1984). Some thoughts on the use of decision theory to set cutoff scores: Comment on de Gruijter and Hambleton. Applied Psychological Measurement, 8, 9-17.
Wang, N. (2003). Use of the Rasch model in standard setting: An item mapping method. Journal of Educational Measurement, 40(3), 231-253.
Webb, M. W. I., & Miller, E. R. (1995). A comparison of the paper selection method and the contrasting groups method for setting standards on constructed- response items. U.S.; Pennsylvania: December 31, 2004, from ERIC database.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problem. Journal of Educational Measurement, 21(4), 361-375.
Wilcox, R. (1976). A note on the length and passing score of a mastery test. Journal of Educational Statistics, 1, 359-364.
Wilcox, R. (1979). Comparing examinees to a control. Psychometrika, 44, 55-68.
Wiberg, M.(2003). An optimal design approach to criterion-referenced computerized testing. Journal of Educational and Behavioral Statistics, 28(2), 97-110.
Woehr, D. J., Arthur, W., & Fehrman, M. L. (1991). An empirical comparison of cutoff score methods for content-related and criterion-related validity settings. Educational and Psychological Measurement, 51(4), 1029-1039.
Zieky, M. J., & Livingston, S. A. (1977). Manual for setting standards on the Basic Skills Assessment Tests. Princeton, NJ: Educational Testing Service.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG for Windows (version 3). Chicago, IL: Scientific Software International, Inc.
描述 碩士
國立政治大學
教育研究所
92152003
93
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0921520031
資料類型 thesis
dc.contributor.advisor 余民寧zh_TW
dc.contributor.advisor Yu, Min-Ningen_US
dc.contributor.author (Authors) 謝進昌zh_TW
dc.contributor.author (Authors) Shieh, Jin-Changen_US
dc.creator (作者) 謝進昌zh_TW
dc.creator (作者) Shieh, Jin-Changen_US
dc.date (日期) 2004en_US
dc.date.accessioned 17-Sep-2009 15:10:37 (UTC+8)-
dc.date.available 17-Sep-2009 15:10:37 (UTC+8)-
dc.date.issued (上傳時間) 17-Sep-2009 15:10:37 (UTC+8)-
dc.identifier (Other Identifiers) G0921520031en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/33049-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 教育研究所zh_TW
dc.description (描述) 92152003zh_TW
dc.description (描述) 93zh_TW
dc.description.abstract (摘要) 本研究目的,乃在運用試題反應理論中最大測驗訊息量的概念於精熟標準設定上作為探討的主軸,透過其歷史的演進與發展,衍生出詮釋本研究最大測驗訊息量法的三個面向,分別為:元素的搭配組合與調整、廣義測驗建構流程、多元效度等,並以此概念賦予解釋運用最大測驗訊息量於精熟標準設定時的合理性與適切性。同時,確立最大測驗訊息量法於公式意涵、試題選擇與統計考驗力面向的合理性,建立其於精熟標準上的理論基礎,而後,再輔以精熟/未精熟者分類一致性信度值以期提供多元效度證據。最後,探討測驗分數的轉換方法、差異能力描述,期能同時獲得量與質的測驗結果解釋。

綜整分析,可發現以下幾點結論:
一、運用最大測驗訊息量法於精熟標準設定時,在分類的信度指標上,顯示由此求得精熟標準,經交叉驗證後,大致可獲得滿意的結果,皆有高達九成以上的精確分類水準,且藉由區間的概念亦能充分顯現出,以最大測驗訊息量法求得之標準,可作為專家設定精熟標準時參考、判斷的優勢。而在分數轉換上,不論搭配換算古典測驗分數法或測驗特徵曲線構圖法時,其分類精熟/未精熟者的一致性表現,大致可獲得滿意的結果,乃是值得參照的組合策略。

二、在運用定錨點以解釋由最大測驗訊息量法於國中基本學力測驗求得之精熟標準時,可發現未精熟者乃僅需具備學科基礎知識與簡易圖示理解能力,而對於精熟者而言,則需進一步擁有對於廣泛學科知識的了解;複雜問題、資料與圖表詮釋;邏輯推理、分析實驗結果以獲得相關論點等能力,或者更高階之具備進階學科知識;綜合、評鑑資料、情境傳遞之訊息的能力。

三、探討測驗長度因素時,分析結果顯示不論採行最大測驗訊息量法、換算古典測驗分數法或是測驗特徵曲線構圖法,皆受此因素的影響,顯示測驗長度愈長,分類一致性愈高,此項結果乃與過去大多數的研究一致。另,由本資料分析結果乃建議測驗長度20題時,會是必備的基本題數要求值。此外,若從細部精確錯誤分類人數角度分析時,於實務用途上,可發現對於影響轉換分數時,產生差異分數的因素,決策者並不容易掌握與控制,但卻可藉由增加測驗長度,分散分數點的人數,以彌平錯誤分類的影響。

四、探討測驗異質性因素時,最大測驗訊息量法因具有因試題參數而調整估計受試者能力的特性,使得在異質測驗時,分類一致性仍能維持在不錯的水準之上。反觀換算古典測驗分數法與測驗特徵曲線構圖法,在固定精熟標準下,則有明顯的錯誤分類比率,此現象也反應出現行以固定60分作為及格(精熟)標準的缺失。

五、探討採用簡易測驗、困難測驗或常態測驗間於轉換分數上之效果時,由換算古典測驗分數法或測驗特徵曲線構圖法轉換來自最大測驗訊息量法之精熟標準時,資料分析結果顯示,不論於何種測驗難度類型中,採用何種轉換方式,並不會嚴重影響轉換分數間一致性分類的效果。另,若從細部精確錯誤分類人數角度分析時,本研究所採之最大測驗訊息量法,因具備隨測驗難易程度來決定門檻的特性,於簡易測驗中求得之精熟標準較低,而於困難測驗中求得之精熟標準相對較高,使得於轉換分數上,即使有較大的差異分數,亦不會造成嚴重的錯誤分類人數。

六、在探討測驗長度、測驗異質性因素與定錨點題目篩選間互動關係時,分析結果顯示,測驗長度與測驗異質性,並非是絕對影響定錨點題目篩選的因素,更重要的在於最大試題訊息量所對應之最適能力值是否能與定錨點相搭配。

綜整之,本研究所採最大測驗訊息量法,經檢驗後,於分類一致性上乃具有不錯的表現,且搭配相對強韌、嚴謹的理論支持與適切測驗結果解釋方法等,是最適合用於大型考試上使用。因此,乃建議未來政府單位或實務工作者於進行大型證照、資格檢定考試時,可考慮使用本策略。
zh_TW
dc.description.abstract (摘要) The purpose of this study is to adopt the concepts of IRT maximum test information to standard setting. At first, we are trying to discover three facets of interpretation in using the maximum test information to standard setting through the historical movement of standard setting. The three facets are component combination and adjustment, generalized test construction processes and multiple validities. Depending on these three concepts, we can easily explain the reasonableness and appropriateness of maximum test information approach. After that, we further investigate the reasonableness from the dimensions of definition of formula, item selections and statistical power to establish the basic theory of the maximum information approach in standard setting. In addition, we also examine the effects on exact classification of master/non-master in expectation to provide multiple evidences for validity. Finally, the method of classical test scores transformation and difference ability description are discussed to provide quantitative and qualitative test result interpretation simultaneously.

In sum, some conclusions are proposed.
1.In applying the maximum test information approach to standard setting, the effect on exact classification of master/ non-master may come to a satisfying result. We may have at least 90% exact classification performance. At the same time, we also find that the mastery standard deriving from the maximum test information approach may have some advantages being a starting reference point for experts to adjust on the basis of the view of confidence interval. In the aspect of classical test scores transformation, no matter what approach you take, the transformed classical test scores approach or test characteristic curve mapping method, the consistency of exact classification of master/ non-master may hold. We may suggest the combination strategy is really worthy to take into consideration in standard setting.

2.In applying the anchor point to interpret Basic Competency Test result, we may find non-master only has basic academic knowledge and simple graph understanding ability, but for the master, he may need extensive academic knowledge; ability of complicated problems、data and graph interpretation; logic reasoning、analyzing experimental result to get related issues. Moreover, advanced academic knowledge; ability of synthesizing and evaluating information from data and surroundings are also included.

3.In the aspect of test length, the result of this research shows no matter what approach you take, maximum test information approach、transformed classical test scores approach or test characteristic curve mapping method, they are all influenced. It shows the longer test length, the higher consistency of exact classification of master/non-master. This result is consistent to most of the studies in the past. On the other hand, we suggest the 20 items is a fundamental value. Moreover, from the view of exact number of error classification, we can find that the real factor affecting the difference scores in transforming classical test score is unable to control in practical usage, but we can just disperse the numbers of people in each test score point to reduce the influence of error classification by increasing test length.

4.In the aspect of diverse test difficulty, because the maximum test information approach possesses the characteristic of examinees’ ability adjustment depending on item parameters, it is less influenced to maintain a acceptable level of consistent classification. In contrast with the maximum test information approach, the transformed classical test scores approach and test characteristic curve mapping method may have obvious high ratio of error classification under the fixed mastery standard. This also reflects the deficiency of current fixed 60 points passing scores.

5.In the aspect of analyzing the effect of score transformation between easy、hard and normal test, this research shows no matter what approach you take in any type of test difficulty, they may not severely influenced. Furthermore, from the view of exact number of error classification, because the maximum test information approach possesses the characteristic of deciding passing level depending on the degree of test difficulty (the lower mastery standard in easy test and the higher in hard test), it may not lead to a severe error classification even if there exists a large difference score in classical test score transformation.

6.In the aspect of interaction between test length、diverse test difficulty and anchor items selection, this research shows that test length and diverse test difficulty are not the real factors affecting anchor items selection. The more accurate cause is if the mastery standard deriving from the maximum test information approach may coordinate with the anchor point or not.

In sum, the maximum test information approach may not only lead to a satisfying exact classification performance after analysis, but also be supported by strong and strict theory and accompany proper test result interpretation method. It is the most proper method in standard setting for large-sized test. Finally, we suggest the government or practitioners may consider adopting this strategy for future usage.
en_US
dc.description.tableofcontents 目錄
第一章 緒論
第一節 研究動機 ……………………………………………………………..1
第二節 研究目的與待答問題 ………………………………………………..2
第三節 名詞釋義 ……………………………………………………………..4
第二章 文獻探討
第一節 精熟標準設定 ………………………………………………………..7
第二節 最大測驗訊息量法…………………………………………………..31
第三節 通過分數與能力標準描述……………………. ……………………35
第四節 精熟標準設定之相關議題探討……………………………………..43
第三章 研究方法
第一節 研究架構……………………………………………………………..52
第二節 研究樣本與工具……………………………………………………..53
第三節 實施流程與研究設計………………………………………………..57
第四節 資料分析與統計處理………………………………………………..63
第四章 結果與討論
第一節 2002年國中基測精熟標準設定結果之分析……………………….65
第二節 精熟標準設定方法間測驗長度因素結果之分析…………………..77
第三節 精熟標準設定方法間測驗異質性因素結果之分析………………..88
第五章 結論與建議
第一節 結論………………………………………………………………….101
第二節 建議………………………………………………………………….104

參考文獻………………………….……………………………..…………………108
附錄………………………………………….…………….…….…………………121
附錄一 91年國中基本學力測驗自然科兩次測驗試題參數估計結果…………121
附錄二 91年國中基本學力測驗自然科兩次測驗轉換古典分數法轉換結果一覽表…………………………………………….……………………………125
附錄三 自然科二次測驗下採用TEST1或TEST2定錨點於各試題之正確反應機率值………………………………………………………………………….127
附錄四 自然科兩次測驗各定錨點題目相對之原始試題內容…………………..134
附錄五 自然科第二次測驗精熟/未精熟者差異能力描述………………………149
附錄六 自然科兩次測驗不同測驗長度類型下所篩選之試題…………………..151
附錄七 自然科兩次測驗不同測驗長度類型下IRT-θ能力值與各古典測驗分數之對照表…………………………………………………………………..153
附錄八 自然科兩次測驗不同測驗長度類型下採用TEST1或TEST2定錨點於各試題之正確反應機率值與該試題最大訊息量對應之能力值……………..158
附錄九 自然科兩次測驗不同異質性測驗之試題參數值……………………….177
附錄十 自然科兩次測驗測驗異質類型下IRT-θ能力值與各古典測驗答對題數之對照表………………………………………………………………….180
附錄十一 自然科兩次測驗測驗異質類型下採用TEST1(Easy或Hard)或TEST2 (Easy或Hard)定錨點於各試題之正確反應機率值與該試題最大訊息量對應之能力值……………………………………………………….182
































表目次
表2-1 Nedelsky法之評判者記錄表範例……………………….….……………..13
表2-2 Angoff法之評判者記錄表範例……….……………….……………….….14
表2-3吳裕益評定量表法評審記錄表………………………..………..…………. 16
表2-4 Ebel法中測驗試題的適切性、難度與期望成功機率值……………..…..19
表2-5歷年不同精熟標準設定方法比較研究一覽表………………...……………25
表2-6歷年精熟標準設定過程議題一覽表……………………………………….. 27
表2-7五個測驗試題總分及其對應的反應組型……………………...……………36
表2-8受試者於兩次測驗上精熟分類的摘要表………………………………….. 46
表2-9兩次測驗試題難易度與作答表現………………………….…..……………51
表3-1國中基本學力測驗第一、二次測驗男女生受試者分配表…………………53
表3-2國中基本學力測驗第一、二次測驗各區域受試者分配表…………………53
表3-3自然科兩次測驗各指標摘要表……………………………...………………56
表3-4自然科第一次測驗試題因素分析摘要……………………...………………56
表3-5自然科第二次測驗試題因素分析摘要………………………...……………56
表3-6自然科第一次測驗第1位受試者作答資料範例……………………………57
表3-7自然科兩次測驗研究操弄樣本各指標摘要表………………..……………57
表3-8 轉換分數效益評估範例說明示意……………………………...…………..60
表4-1本章英文代號與中文縮寫含意對照表………………………..……………64
表4-2自然科兩次測驗最大測驗訊息量對應能力值、各方法之轉換古典測驗分數與相關結果一覽表…………………………………………………………..65
表4-3自然科兩次測驗各精熟標準設定方法之分類結果……….……...………..66
表4-4自然科第一次測驗於轉換分數上之分類一致性效果評估方式……….….68
表4-5自然科兩次測驗各轉換方法間分類效果一覽表…………………………..68
表4-6自然科第一次測驗於轉換分數方法間分類一致性差異…………....……..69
表4-7自然科兩次測驗轉換方法間分類差異效果一覽表……………...….……..69
表4-8自然科兩次測驗定錨點一覽表………………………………...….………..70
表4-9第一次測驗下採用TEST1定錨點挑選之定錨點題目與其相對之正確反應機率值……......………………………………………..……………..…….……70
表4-10 TEST1定錨點下自然科兩次測驗定錨點題目篩選結果………………..…72
表4-11 TEST2定錨點下自然科兩次測驗定錨點題目篩選結果…….…..…....…..72
表4-12自然科第一次測驗未精熟者能力描述………………….…...…………….73
表4-13自然科第一次測驗精熟者能力描述……………………...………………..74
表4-14自然科兩次測驗不同測驗長度類型下最大測驗訊息量對應能力值、各方法之轉換古典測驗答對題數與相關結果一覽表……….…………….......77
表4-15自然科兩次測驗不同測驗長度類型下最大測驗訊息量法分類結果……79
表4-16自然科兩次測驗不同測驗長度類型下換算古典測驗分數法分類結果…79
表4-17自然科兩次測驗不同測驗長度類型下測驗特徵曲線構圖法分類結果…80
表4-18自然科兩次測驗不同測驗長度類型下最大測驗訊息量法與換算古典測驗分數法間轉換結果一覽表…………………………………………………81
表4-19自然科兩次測驗不同測驗長度類型下最大測驗訊息量法與測驗特徵曲線構圖法間轉換結果一覽表…….……………………………………………81
表4-20不同測驗長度類型下轉換方法間之差異分數與通過分數鄰近分數點人數…………………………………………………………………………...82
表4-21自然科兩次測驗不同測驗長度類型下換算古典測驗分數法與測驗特徵曲線構圖法間轉換結果一覽表………………………………………………82
表4-22不同測驗長度類型下各定錨點篩選結果一覽表…………………………84
表4-23不同測驗長度類型下各定錨點題目篩選結果一覽表………...………….85
表4-24第一次測驗長度50題下採用TEST1定錨點之定錨點題目篩選結果…..86
表4-25測驗異質性下最大測驗訊息量對應能力值、各方法之轉換古典測驗答對題數與相關結果一覽表……………………………………………………89
表4-26測驗異質性下最大測驗訊息量法分類結果一覽表………………………90
表4-27測驗異質性下換算古典測驗分數法分類結果一覽表……………………92
表4-28測驗異質性下測驗特徵曲線構圖法分類結果一覽表…………...……….92
表4-29測驗異質性下最大測驗訊息量法與換算古典測驗分數法間轉換結果一覽表…………………………………………………………………………....94
表4-30測驗異質性下最大測驗訊息量法與測驗特徵曲線構圖法間轉換結果一覽表……………………………………………………………………………94
表4-31測驗異質性下轉換方法間之差異分數與通過分數鄰近分數點人數……95
表4-32測驗異質性下換算古典測驗分數法與測驗特徵曲線構圖法間轉換結果一覽表…………………………………………………………………………95
表4-33測驗異質類型下各定錨點篩選結果一覽表………………………………97
表4-34測驗異質類型下各定錨題目篩選結果一覽表………………...…….……98
表4-35第一次簡易測驗下採用Easy(TEST1)定錨點之定錨題目篩選結果..…...99
表4-36第二次簡易測驗下採用Easy(TEST2)定錨點之定錨題目篩選結果……100










圖目次
圖2-1以能力標準設定為核心的測驗建構流程圖…………………………………9
圖2-2精熟標準設定方法分類圖………………………………..…………………12
圖2-3書簽技術中排序的試題卷………………………………..………………... 18
圖2-4試題構圖法之直方圖範例………………………………..…………………18
圖2-5對照組圖形法之範例圖…………………………………………..…………21
圖2-6過去相關研究概念解說圖….……………………………………………… 29
圖2-7本研究所運用精熟標準方法之廣義流程概念圖……………..…………... 30
圖2-8能力區間穩定估計概念圖 ………………………………………………… 33
圖2-9試題特徵曲線圖 ……………………………………..…………………… 37
圖2-10測驗特徵曲線構圖法 …………………………….………………………38
圖2-11最大測驗訊息量法延伸議題解說圖……………………………………….41
圖2-12解決最大測驗訊息量法疑義與定錨點題目數不足示意圖……………… 42
圖2-13完美百分比一致性信度示意圖…………………………………………… 47
圖3-1本研究究研架構圖…………………………………..………………………52
圖3-2測驗異質性研究設計圖……………………………....…………………… 62
圖4-1自然科兩次測驗最大測驗訊息量圖……………………..……..…………..67
圖4-2不同測驗長度類型下各分數點人數變化圖………………..………………83
圖4-3 Hard(TEST1)與Easy(TEST1)之最大測驗訊息量圖………..……………..91
圖4-4異質性測驗下採行古典固定精熟標準之缺失解說圖………..…………... 93
圖4-5 測驗異質類型下各測驗分數點人數變化圖…………………...…………..96
圖5-1 以最大測驗訊息量法為核心建構之精熟標準設定模式……………...…106
zh_TW
dc.format.extent 14881 bytes-
dc.format.extent 20808 bytes-
dc.format.extent 40772 bytes-
dc.format.extent 34785 bytes-
dc.format.extent 45489 bytes-
dc.format.extent 1314566 bytes-
dc.format.extent 119798 bytes-
dc.format.extent 862399 bytes-
dc.format.extent 60763 bytes-
dc.format.extent 87814 bytes-
dc.format.extent 2295740 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0921520031en_US
dc.subject (關鍵詞) 最大測驗訊息量法zh_TW
dc.subject (關鍵詞) 換算古典測驗分數法zh_TW
dc.subject (關鍵詞) 測驗特徵曲線構圖法zh_TW
dc.subject (關鍵詞) 定錨點zh_TW
dc.subject (關鍵詞) 精熟標準設定zh_TW
dc.subject (關鍵詞) 精熟測驗zh_TW
dc.subject (關鍵詞) maximum test information approachen_US
dc.subject (關鍵詞) transformed classical test scores approachen_US
dc.subject (關鍵詞) test characteristics curve mapping methoden_US
dc.subject (關鍵詞) anchor pointsen_US
dc.subject (關鍵詞) standard settingen_US
dc.subject (關鍵詞) mastery testen_US
dc.title (題名) 以最大測驗訊息量決定通過分數之研究zh_TW
dc.title (題名) Study of the Standard Setting by the Maximum Test Informationen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 行政院教育改革委員會(1996)。教育改革總諮議報告書(第三章綜合建議)。2004年12月5日,取自http://www.edu.tw/eduinf/change/5/CH-3.html#c3。zh_TW
dc.relation.reference (參考文獻) 考選部(2004)。考選部全球資訊網。2004年5月31日,取自http://inter1.moex. gov.tw/statute/statute1.asp?kind=31。zh_TW
dc.relation.reference (參考文獻) 余民寧(1992)。試題反應理論的介紹(7):訊息函數。研習資訊,9(6),5-9。zh_TW
dc.relation.reference (參考文獻) 余民寧(2002)。教育測驗與評量:成就測驗與教學評量(第二版)。臺北市:心理出版社。zh_TW
dc.relation.reference (參考文獻) 余民寧、汪慧瑜(2005)。量尺分數的另類表示方法:以國中基本學力測驗為例。教育與心理研究,28期,審稿中。zh_TW
dc.relation.reference (參考文獻) 林惠芬(1993)。通過分數設定方法在護理人員檢覈筆試測驗之研究。測驗年刊,40,253-262。zh_TW
dc.relation.reference (參考文獻) 吳裕益(1986)。標準參照測驗通過分數設定方法之研究。國立政治大學教育研究所博士論文(未出版)。zh_TW
dc.relation.reference (參考文獻) 吳裕益(1988)。標準參照測驗通過分數設定方法之研究,測驗年刋,35,159-166。zh_TW
dc.relation.reference (參考文獻) 涂柏原,陳柏熹,章舜雯,林世華(2000)。基本學力分數的建立。國中基本學力測驗推動工作委員會,2004年12月6日,取自http://www.bctest.ntnu.edu.tw/score1.htm。zh_TW
dc.relation.reference (參考文獻) 莊淑如(1997)。證照制度的落實—以德國經驗為借鏡。技職雙月刋,37,54-56。zh_TW
dc.relation.reference (參考文獻) 教育部(1998)。國民中學學生基本學力指標。台北﹕教育部。zh_TW
dc.relation.reference (參考文獻) 教育部國教司(2004)。國民中小學九年一貫課程暫行綱要,2004年12月5日,取自http://140.122.120.230/9CC/temporary/temporary-all.htm。zh_TW
dc.relation.reference (參考文獻) 國民中學學生基本學力測驗推動工作委員會(2002a)。九十、九十一年國中基本學力測驗試題取材範圍比較。飛揚第十三期,2004年12月5日,取自http://www.bctest.ntnu.edu.tw/。zh_TW
dc.relation.reference (參考文獻) 國民中學學生基本學力測驗推動工作委員會(2002b)。國中基本學力測驗自然科試題之設計理念。飛揚第十三期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。zh_TW
dc.relation.reference (參考文獻) 國民中學學生基本學力測驗推動工作委員會(2002c)。九十一年第二次國民中學學生基本學力測驗試題特色。飛揚第十六期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。zh_TW
dc.relation.reference (參考文獻) 國民中學學生基本學力測驗推動工作委員會(2003)。測驗分數的解釋(下)。飛揚第二十三期,2005年4月6日,取自http://www.bctest.ntnu.edu.tw/。zh_TW
dc.relation.reference (參考文獻) 鄭明長、余民寧(1994)。各種通過分數設定方法之比較。測驗年刊,41,19-40。zh_TW
dc.relation.reference (參考文獻) 鄭清泉(2001)。人工化與電腦化適性精熟能力判定在國小學童數學精熟分類一致性之比較研究。國立嘉義大學國民教育研究所碩士論文。zh_TW
dc.relation.reference (參考文獻) 謝進昌、余民寧(2005)。國中基本學力測驗之DIF的實徵分析:以91年度二次測驗為例,國立新竹師範學院學報,審稿中。zh_TW
dc.relation.reference (參考文獻) Andrew, B. J., & Hecht, J. T. (1976). A preliminary investigation of two procedures for setting examination standards. Educational and Psychological Measurement, 36, 35-50.zh_TW
dc.relation.reference (參考文獻) Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (pp.508-600). Washington, D.C.: American Council on Education.zh_TW
dc.relation.reference (參考文獻) Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.19-32). Hillsdale, NJ: Lawrence Erlbaum.zh_TW
dc.relation.reference (參考文獻) Beaton, A. E., & Allen, N. L. (1992). Interpretation scales through scale anchoring , Journal of Educational Statistics, 17, 191-201.zh_TW
dc.relation.reference (參考文獻) Behuniak, P., Archambault, F. X., & Gable, R. K. (1982). Angoff and Nedelsky standard setting procedures: Implication for the validity of proficiency test score interpretation, Educational and Psychological Measurement, 42, 247-255.zh_TW
dc.relation.reference (參考文獻) Berk, R.A. (1976). Determination of optimal cutting scores in criterion-referenced measurement. Journal of Experimental Education, 45, 4-9.zh_TW
dc.relation.reference (參考文獻) Berk, R. A. (1980). A consumers` guide to criterion-referenced test reliability. reliability. Journal of Educational Measurement, 17(4), 323-349.zh_TW
dc.relation.reference (參考文獻) Berk, R. A. (1984). A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press.zh_TW
dc.relation.reference (參考文獻) Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion- referenced tests. Review of Educational Measurement, 56(1), 137-172.zh_TW
dc.relation.reference (參考文獻) Berk, R. A. (1996). Standard setting: The next generation (where few psychometricians have gone before!). Applied Measurement in Education, 9(3), 215-235.zh_TW
dc.relation.reference (參考文獻) Bernknopf, S., Curry, A., & Bashaw, W. L.(1979). A defensible model for determining a minimal cutoff score for criterion referenced tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.zh_TW
dc.relation.reference (參考文獻) Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick, Statistical theories of mental test scores (chapters 17-20). Reading, MA: Addison-Wesley.zh_TW
dc.relation.reference (參考文獻) Bontempo, B. D., Marks, C. M., & Karabatsos, G. (1998). A meta-analytic assessment of empirical differences in standard setting procedures. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA.zh_TW
dc.relation.reference (參考文獻) Brennan, R. L. & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory, Applied Psychological Measurement, 4, 219-240.zh_TW
dc.relation.reference (參考文獻) Brandon, P. R. (2002). Two versions of the contrasting-groups standard-setting method: A review. Measurement and Evaluation in Counseling and Development, 35(3), 167-181.zh_TW
dc.relation.reference (參考文獻) Brandon, P. R. (2004). Conclusion about frequently studied modified Angoff standard setting topics. Applied Measurement in Education, 17(1), 59-88.zh_TW
dc.relation.reference (參考文獻) Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the national teacher examinations. Journal of Educational Measurement, 27(2), 145-163.zh_TW
dc.relation.reference (參考文獻) Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2002). A comparison of Angoff and Bookmark standard setting method. Journal of Educational Measurement, 39(3), 253-263.zh_TW
dc.relation.reference (參考文獻) Cascio, W. F., Alexander, R.A., & Barrett, G. V. (1988). Setting cutoff scores: Legal, psychometric, and professional issues and guidelines. Personnel Psychology, 41, 1-24.zh_TW
dc.relation.reference (參考文獻) Chang, L., Dziuban, C. D., & Hynes, M. C.(1996). Does a standard reflect minimal competency of examinees or judge competency? Applied Measurement in Education, 9(2), 161-173.zh_TW
dc.relation.reference (參考文獻) Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education, 12(2), 151-165.zh_TW
dc.relation.reference (參考文獻) Chang, L., van der Linder, W. J., & Vos, H. J.(2004). Setting standards and detecting intrajudge inconsistency using interdependent evaluation of response alternatives. Educational and Psychological Measurement, 64(5), 781-801.zh_TW
dc.relation.reference (參考文獻) Chinn, R. N., & Hertz, N. R. (2002). Alternative approaches to standard setting for licensing and certification examinations. Applied Measurement in Education, 15(1), 1-14.zh_TW
dc.relation.reference (參考文獻) Cizek, G. J. (1993). Reconsidering standard and criteria. Journal of Educational Measurement, 30(2), 93-106.zh_TW
dc.relation.reference (參考文獻) Cizek, G. J. (1996). Standard-setting guidelines. Educational Measurement: Issues and Practice, 15(1), 13-21.zh_TW
dc.relation.reference (參考文獻) Clauser, B. E., Swanson, D. B., & Harik, P.(2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure. Journal of Educational Measurement, 39(4), 269-290.zh_TW
dc.relation.reference (參考文獻) Cohen, J. A. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.zh_TW
dc.relation.reference (參考文獻) Conover, J. W., & Iman, R. L. (1978). The rank transformation as a method of discrimination with some examples, Albquerque, NM: Sandia Laboratories.zh_TW
dc.relation.reference (參考文獻) Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. NY: CBS College Publishing.zh_TW
dc.relation.reference (參考文獻) Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teachers Examinations, Journal of Educational Measurement, 21, 113-129.zh_TW
dc.relation.reference (參考文獻) Cross, L. H., Frary, R. B., Kelly, P. P., Small, R. C., & Impara, J. C. (1985). Establishing minimum standards for essays: Blind versus informed review. Journal of Educational Measurement, 22, 137-146.zh_TW
dc.relation.reference (參考文獻) de Gruijter, D. N. M., & Hambleton, R. K. (1984). On problems encountered using decision theory to set cutoff scores. Applied Psychological Measurement, 8, 1-8.zh_TW
dc.relation.reference (參考文獻) Dillon, G. F. (1996). The expectations of standard setting judges. CLEAR Exam Review, 2, 22-26.zh_TW
dc.relation.reference (參考文獻) Ebel, R. L. (1972). Essentials of educational measurement (2rd ed.). Englewood Cliffs, NJ: Prentice-Hall.zh_TW
dc.relation.reference (參考文獻) Ebel, R. L. (1979). Essentials of educational measurement (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.zh_TW
dc.relation.reference (參考文獻) Educational Testing Service (1976). Report on a study of the use of the National Teachers’ Examination by the state of South Carolina. Princeton, NJ: Author.zh_TW
dc.relation.reference (參考文獻) Eignor, D. R., & Hambleton, R. K. (1979). Effects of test length and advancement score on several criterion referenced test reliability and validity indices. (Laboratory of Psychometric and Evaluative Research Report No. 86). Amherst, MA: University of Massachusetts, School of Education.zh_TW
dc.relation.reference (參考文獻) Emrick, J. A. (1971). An evaluation model for mastery testing. Journal of Educational Measurement, 8, 321-326.zh_TW
dc.relation.reference (參考文獻) Fehrman, M. L., Woehr, D. J., & Arthur, W. (1991). The Angoff cutoff score method: The impact of frame-of-reference rater training. Educational and Psychological Measurement, 51(4), 857-872.zh_TW
dc.relation.reference (參考文獻) Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems, Annals of Edgenics, 7, 179-188.zh_TW
dc.relation.reference (參考文獻) Fitzpatrick, A. R. (1989). Social influences in standard setting: The effects of social interaction on group judgments. Review of Educational Research, 59(3), 315-328.zh_TW
dc.relation.reference (參考文獻) Frick, T. W. (1992). Computerized adaptive mastery tests as expert systems. Journal of Educational Computing Research, 8(2), 187-213.zh_TW
dc.relation.reference (參考文獻) Garrett, H. E. (1937). Statistics in psychology and education. New York: Longmans, Green.zh_TW
dc.relation.reference (參考文獻) Gessaman, M. P., & Gessaman, P. H. (1972). A comparison of some multivariate discrimination procedures. Journal of the American Statistical Association, 67, 468-472.zh_TW
dc.relation.reference (參考文獻) Giraud, G., Impara, J. C., & Buckendahl, C. (2000). Making the cut in school districts: Alternative methods for setting cut-scores. Educational Assessment, 6, 291-304.zh_TW
dc.relation.reference (參考文獻) Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519-521.zh_TW
dc.relation.reference (參考文獻) Glaser, R., & Nitko, A. J. (1971). Measurement in learning and instruction . In R. L. Thorndike (Eds.). Education Measurement (2nd ed., pp625-670). Washington, DC: American Council on Education.zh_TW
dc.relation.reference (參考文獻) Gray, W. M. (1978). A comparison of Piagetian theory and criterion-referenced measurement, Review of Educational Research, 48, 223-249.zh_TW
dc.relation.reference (參考文獻) Green, D. R., Trimble, C. S., & Lewis, D. M. (2003). Interpreting the results of three different standard setting procedures. Educational Measurement: Issues and Practice, 22(1), 22-32.zh_TW
dc.relation.reference (參考文獻) Guilford, J. P. (1942). Fundamental statistics in psychology and education. New York: McGraw-Hill.zh_TW
dc.relation.reference (參考文獻) Haertel, E. (1985). Construct validity and criterion-referenced testing, Review of Educational Research, 55(1), 23-46.zh_TW
dc.relation.reference (參考文獻) Haladyna, T. M., & Roid, G. H. (1983). A comparison of two approaches to criterion-referenced test construction. Journal of Educational Measurement, 20(3), 271-281.zh_TW
dc.relation.reference (參考文獻) Halpin, G., Sigmon, G., & Halpin, G.(1983). Minimum competency standards set by three divergent groups of raters using three judgemental procedures: Implication for validity, Educational and Psychological Measurement , 43, 185-196.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1978). On the use of cut-off scores with criterion-referenced tests in instructional settings. Journal of Educational Measurement, 15(4), 277-290.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1980). Test score validity and standard setting methods. In R. A. Berk(Ed.), Criterion-referenced measurement: The State of Art. Baltimore, Md.: John Hopkins University.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1983). Application of item response models to criterion referenced assessment. Applied Psychological Measurement, 7(1), 33-44.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn(Eds.), Educational measurement (3rd ed.)(pp.147-200). New York: Macmillan.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1990). Criterion referenced testing methods and practices. In T. B. Gutkin, & C. R. Reynolds (2nd ed.), The Handbook of School Psychology (pp. 388-415). New York: John Wiley & Sons.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (1998). Enhancing the validity of NAEP achievement level score reporting. Proceedings of achievement levels workshop (pp. 77-98). Washington, DC: National Assessment Governing Board.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 89-116). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & de Gruijter, D. N. M. (1983). Application of item response models to criterion referenced test item selection. Journal of Educational Measurement, 20(4), 355-367.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. N., & Eignor, D. R.(1980). Competency test development, validation, and standard setting. In R. M. Jaeger, & C. X. Tittle (Eds.), Minimum competency achievement testing: Motive, models, measures, and consequence (pp.367 -396). Berkeley, CA: McCutchan.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., Jaeger, R. M., Plake, B. S., & Mills, C. N.(in press). Handbook for setting performance standards. Washington, DC: Council of Chief State School Officers.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., Mills, C. N., & Simon, R. (1983). Determining the lengths for criterion referenced tests. Journal of Educational Measurement, 20(1), 27-38.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & Novick, M. R. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 10, 159-170.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & Plake, B. S. (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in Education, 8(1), 41-55.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. N., Swaminathan, H., Algina, J., & Coulson, D. B. (1978). Criterion- referenced testing and measurement: A review of technical issues and developments. Review of Educational Research, 48, 1-47.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and application. Boston: Kluwer Nijhoff Publishing.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & Traub, R. E.(1973). Analysis of empirical data using two logistic latent trait models, British journal of mathematical and statistical psychology, 26, 195-211.zh_TW
dc.relation.reference (參考文獻) Hambleton, R. K., & Zaal, J. N. (1991). Advances in educational and psychological testing (Eds.). Boston, MA: Kluwer.zh_TW
dc.relation.reference (參考文獻) Harasym, P. H. (1981). A comparison of the Nedelsky and modified Angoff standard-setting procedure on evaluation outcome. Educational and Psychological Measurement, 41(3), 725-734.zh_TW
dc.relation.reference (參考文獻) Harwell, M. R. (1983). A comparison of two item selection procedures in criterion referenced measurement. Unpublished doctoral dissertation, University of Wisconsin-Madison.zh_TW
dc.relation.reference (參考文獻) Hattie, J. A. (1985). Methodological review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.zh_TW
dc.relation.reference (參考文獻) Hoge, R. D., & Coladarci, T.(1989). Teacher based judgments of academic achievement : A review of the literature, Review of educational research, 59(3), 297-313.zh_TW
dc.relation.reference (參考文獻) Hudson, J. P. Jr., & Campion, J. E. (1994). Hindsight bias in an application of the Angoff method for setting cutoff scores. Journal of Applied Psychology, 79(6), 860-865.zh_TW
dc.relation.reference (參考文獻) Hurtz, G. M., & Hurtz, N. R.(1999). How many raters should be used for establishing cutoff scores with the Angoff method? A generalizability theory study. Educational and Psychological Measurement, 59(6), 885-897.zh_TW
dc.relation.reference (參考文獻) Hurtz, M. G., & Auerbach, M. A. (2003). A meta analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educational and Psychological Measurement, 63(4), 584-601.zh_TW
dc.relation.reference (參考文獻) Huynh, H. (1998). On score locations of binary and partial credit items and their applications to item mapping and criterion-referenced interpretation. Journal of Educational and Behavioral Statistics, 23(1), 35-56.zh_TW
dc.relation.reference (參考文獻) Huyhn, H. (2000). On item mappings and statistical rules for selecting binary items for criterion referenced interpretation and bookmark standard settings. Paper presented at the annual meeting of the National Council on Measurement in Education.(New Orleans, LA, April), 25-27.zh_TW
dc.relation.reference (參考文獻) Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353-366.zh_TW
dc.relation.reference (參考文獻) Ivens, S. H. (1970). An investigation of item analysis, reliability and validity in relation to criterion-referenced tests. Unpublished doctoral dissertation, Florida State University.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M. (1978). A proposal for setting a standard on the North Carolina High School Competency Test. Paper presented at the annual meeting of the North Carolina Association for Research in Education, Chapel Hill.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461-476.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn(Eds.), Educational Measurement (3rd ed., pp. 485-514). New York: Macmillan.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M. (1995). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8(1), 15-40.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M., & Mills, C. N. (1997, April). A holistic procedure for setting performance standards on complex large-scale assessments. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M., & Mills, C. N. (2001). An integrated judgment procedure for setting standards on complex, large-scale assessments. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 313-338). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Jaeger, R. M., & Tittle, C. X. (1980). Minimum competency achievement testing. Berkeley, CA: McCutchan.zh_TW
dc.relation.reference (參考文獻) Kahl, S. R., Crockett, T. J., DePascale, C. A., & Rindfleisch, S. L. (1994). Using actual student work to determine cut-scores for proficiency levels: New methods for new tests. Paper presented at the National Conference on Large-Scale Assessment, Albuquerque, NM.zh_TW
dc.relation.reference (參考文獻) Kahl, S. R., Crockett, T. J., DePascale, C. A., & Rindfleisch, S. L. (1995). Setting standards for performance levels using the student-based constructed response method. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.zh_TW
dc.relation.reference (參考文獻) Kane, M. T. (1987). On the use of IRT models with judgmental standard setting procedures. Journal of Educational Measurement, 24(4), 333-345.zh_TW
dc.relation.reference (參考文獻) Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425-461.zh_TW
dc.relation.reference (參考文獻) Kane, M. (1998). Choosing between examinee-centered and test-centered standard setting methods. Educational Assessment, 5(3), 129-145.zh_TW
dc.relation.reference (參考文獻) Kane, M. (2001). So much remains the same: Conception and status of validation in setting standards. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Kingsbury, G. G., & Weiss, D. J.(1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.257-283).New York: Academic Press.zh_TW
dc.relation.reference (參考文獻) Koffler , S. L. (1980). A comparison of approaches for setting proficiency standards. Journal of Educational Measurement, 17, 167-178.zh_TW
dc.relation.reference (參考文獻) Koretz D., & Deibert E. (1995). Setting standards and interpreting achievement: A cautionary tale from the National Assessment of Educational Progress. Educational Assessment, 3(1), 53-81.zh_TW
dc.relation.reference (參考文獻) Kriewall, T. E. (1972). Aspects and applications of criterion-referenced tests (IER Tech. Paper No. 103). Downers Grove, IL: Institute for Educational Research.zh_TW
dc.relation.reference (參考文獻) Lewis, D.M., Green, D. R., Mitzel, H. C., Baum. K., & Patz, R. J. (1998, April). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.zh_TW
dc.relation.reference (參考文獻) Lewis, D. M., Mitzel, H.C., & Green, D. R. (1996). Standard setting: A bookmark approach. Paper presented at the Council of Chief State School Officers National Conference on Large Scale Assessment, Boulder, CO.zh_TW
dc.relation.reference (參考文獻) Livingston, S. A., & Zieky, M. J.(1978). Basic skills assessment program : manual for setting standards on the basic skills assessment tests. Menlo Park, Calif. : Addison-Wesley Testing Service.zh_TW
dc.relation.reference (參考文獻) Livingston, S. A., & Zieky, M. J.(1989). A comparison study of standard-setting methods. Applied Measurement in Education, 2(2), 121-141.zh_TW
dc.relation.reference (參考文獻) Loomis, S. C., Bay, L., Yang, W., & Hanick, P. L. (1999). Field trials to determine which rating method(s) to use in the 1998 NAEP Achievement Levels-Setting Process for Civics and Writing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal.zh_TW
dc.relation.reference (參考文獻) Loomis, S.C.,& Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 175-217). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.zh_TW
dc.relation.reference (參考文獻) Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.zh_TW
dc.relation.reference (參考文獻) Maurer, T. J., Alexander, R. A., Callahan, C. M., Bailey, J J., & Dambrot, F. H. (1991). Methodological and psychometric issues in setting cutoff scores using the Angoff method, Personnel psychology, 44, 235-262.zh_TW
dc.relation.reference (參考文獻) McGinty, D., & Neel, J. H. (1996). Judgmental standard setting using a cognitive components model. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York.zh_TW
dc.relation.reference (參考文獻) Melican, G.. J., Mills, C. N., & Plake, B. S. (1989). Accuracy of item performance predictions based on the Nedelsky standard setting method. Educational and Psychological Measurement, 49, 467-478.zh_TW
dc.relation.reference (參考文獻) Meskauskas, J. A. (1976). Evaluation models for criterion-referenced testing: Views regarding mastery and standard setting. Review of Educational Research, 46, 133-158.zh_TW
dc.relation.reference (參考文獻) Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 33-45). Hillsdale, NJ: Lawrence Erlbaum.zh_TW
dc.relation.reference (參考文獻) Messick, S. (1989). Validity. In R. L. Linn(Ed.), Educational Measurement, (pp. 13-104). New York: Macmillan.zh_TW
dc.relation.reference (參考文獻) Millman, J. (1972). Tables for determining number of items needed on domain- referenced tests and number of students to be tested. ( Technical Paper No. 5). Los Angeles: Instructional Objective Exchange.zh_TW
dc.relation.reference (參考文獻) Millman, J. (1973). Passing scores and test lengths for domain-referenced measures. Review of Educational Research, 43, 205-216.zh_TW
dc.relation.reference (參考文獻) Mills, C. N. (1983). A comparison of three methods of establishing cut-off scores on criterion-referenced tests. Journal of Educational Measurement, 20(3), 283-292.zh_TW
dc.relation.reference (參考文獻) Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The bookmark method: Psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 249-281). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Nassif, P. M. (1978). Standard setting for criterion referenced teacher licensing tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Toronto.zh_TW
dc.relation.reference (參考文獻) Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3-19.zh_TW
dc.relation.reference (參考文獻) Nitko, A. J. (1980). Distinguishing the many varieties of criterion-referenced tests, Review of Educational Measurement, 50, 461-485.zh_TW
dc.relation.reference (參考文獻) Nitko, A. (1983). Educational tests and measurement: An introduction. Newzh_TW
dc.relation.reference (參考文獻) York: Harcourt Bruce Jovanovich.zh_TW
dc.relation.reference (參考文獻) Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24, 56-64.zh_TW
dc.relation.reference (參考文獻) Norcini, J., Shea, J. A., & Grosso, L. (1991). The effect of numbers of experts and common items on cutting score equivalents based on expert judgment. Applied Psychological Measurement, 15(3), 241-246.zh_TW
dc.relation.reference (參考文獻) Norcini, J. J., Shea, J. A., & Kanya, D. T. (1988). The effect of various factors on standard setting. Journal of Educational Measurement, 25, 57-65.zh_TW
dc.relation.reference (參考文獻) Novick, M. R., & Lewis, C. (1974). Prescribing test length for criterion-referenced measurement. In C. W. Harris, M. C. Alkin, & W. J. Popham (Eds.), Problems in criterion-referenced measurement (CSE Monograph Series in Evaluation, No. 3, pp. 139-158). Los Angeles: Center for the Study of Evaluation, University of California.zh_TW
dc.relation.reference (參考文獻) Novick, M. R., Lewis, C., & Jackson, P. H. (1973). The estimation of proportions in a groups. Psychometrika, 38, 19-45.zh_TW
dc.relation.reference (參考文獻) Pitoniak, M. J. (2003). Standard setting methods for complex licensure examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.zh_TW
dc.relation.reference (參考文獻) Plake, B. S., Hambleton, R. K., & Jaeger, R. M. (1997). A new standard-setting method for performance assessments: The dominant profile judgment method and some field-test results. Educational and Psychological Measurement, 57(3), 400-411.zh_TW
dc.relation.reference (參考文獻) Plake, B. S., & Hambleton, R. K. (2001). The analytic judgment method for setting standards on complex performance assessments. In G. J. Cizek (Ed.). Standard setting: Concepts, methods, and perspectives (pp. 283-312). Mahwah, NJ: Erlbaum.zh_TW
dc.relation.reference (參考文獻) Popham, W. J., & Husek, T. R. (1969). Implication of criterion-referenced measurement, Journal of Educational Measurement, 6, 1-9.zh_TW
dc.relation.reference (參考文獻) Plake, B. S., Impara, J. C., & Potenza, M. T.. (1994). Content specificity of expert judgments in a standard-setting study. Journal of Educational Measurement, 31(4), 339-347.zh_TW
dc.relation.reference (參考文獻) Plake, B. S., & Melican, G. J. (1989). Effects of item context on intrajudge consistency of expert judgments via the Nedelsky standard setting method. Educational and Psychological Measurement, 49(1), 45-51.zh_TW
dc.relation.reference (參考文獻) Popham, W. J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.zh_TW
dc.relation.reference (參考文獻) Putnam, S. E., Pence, P., & Jaeger, R. M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments. Applied Measurement in Education, 8(1), 57-83.zh_TW
dc.relation.reference (參考文獻) Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 237-255). New York: Academic Press.zh_TW
dc.relation.reference (參考文獻) Reckase, M. D. (1998). Converting boundaries between National Assessment Governing Board performance categories to points on the National Assessment of Educational Progress score scale: The 1996 science NAEP process. Applied Measurement in Education, 11, 9-21.zh_TW
dc.relation.reference (參考文獻) Reilly, R. R., Zink, D. L., & Israelski, E. W. (1984). Comparison of direct and indirect methods for setting minimum passing scores. Applied Psychological Measurement, 8, 421-429.zh_TW
dc.relation.reference (參考文獻) Roudabush, G. E. (1974). Models for a beginning theory of criterion-referenced tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.zh_TW
dc.relation.reference (參考文獻) Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modification. Applied Psychological Measurement, 18(3), 229-244.zh_TW
dc.relation.reference (參考文獻) Saunders, J. C, & Mappus, L. L. (1984). Accuracy and consistency of expert judges in setting passing scores on criterion-referenced tests: The South Carolina experience. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.zh_TW
dc.relation.reference (參考文獻) Schoon, C. G., Gullion, C. M., & Ferrara, P. (1979). Bayesian statistics, credentialing examinations, and the determination of passing points. Evaluation and the Health Professions, 2, 181-201.zh_TW
dc.relation.reference (參考文獻) Shepard, L. A. (1983). Standards for placement and certification. In S. B. Anderson & J. S. Helmick (Eds.), On educational testing (pp. 61-90). San Francisco: Jossey-Bass.zh_TW
dc.relation.reference (參考文獻) Sireci, S. G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12(3), 301-325.zh_TW
dc.relation.reference (參考文獻) Skakun, E. N., & Kling, S. (1980). Comparability of methods for setting standards. Journal of Educational Measurement, 17, 229-235.zh_TW
dc.relation.reference (參考文獻) Smith, R. L., & Smith, J. K. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures. Journal of Educational Measurement, 25(4), 259-274.zh_TW
dc.relation.reference (參考文獻) Spray, J. A., & Reckase, M. D. (1994). The selection of test items for decision making with a computer adaptive test. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.zh_TW
dc.relation.reference (參考文獻) Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414.zh_TW
dc.relation.reference (參考文獻) Stephenson, A. S., Elmore, P. B., & Evans, J. A. (2000). Standard-setting techniques: An application for counseling programs. Measurement and Evaluation in Counseling and Development, 32(4), 229-244.zh_TW
dc.relation.reference (參考文獻) Subkoviak, M. J. (1988). A practitioner’s guide to computation and interpretation of reliability indices for mastery test. Journal of Educational Measurement, 25, 47-55.zh_TW
dc.relation.reference (參考文獻) Swaminathan, H., Hambleton, R. K., & Algina, J. (1974). Reliability of criterion-referenced tests: A decision theoretic formulation. Journal of Educational Measurement, 11, 262-267.zh_TW
dc.relation.reference (參考文獻) Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. The seventeen yearbook of the National Society for the study of Education, Part II. Bloomington, III.: Public School, Publishing Company.zh_TW
dc.relation.reference (參考文獻) van der Linden, W. J. (1981) A latent trait look at pretest-posttest validation of criterion referenced test items, Review of Educational Research, 51(3), 379-402.zh_TW
dc.relation.reference (參考文獻) van der Linden, W. J. (1982). A latent trait method for determining intra-judge inconsistency in the Angoff and Nedelsky techniques of standard setting , Journal of Educational Measurement, 19, 25-308.zh_TW
dc.relation.reference (參考文獻) van der Linder, W. J. (1984). Some thoughts on the use of decision theory to set cutoff scores: Comment on de Gruijter and Hambleton. Applied Psychological Measurement, 8, 9-17.zh_TW
dc.relation.reference (參考文獻) Wang, N. (2003). Use of the Rasch model in standard setting: An item mapping method. Journal of Educational Measurement, 40(3), 231-253.zh_TW
dc.relation.reference (參考文獻) Webb, M. W. I., & Miller, E. R. (1995). A comparison of the paper selection method and the contrasting groups method for setting standards on constructed- response items. U.S.; Pennsylvania: December 31, 2004, from ERIC database.zh_TW
dc.relation.reference (參考文獻) Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problem. Journal of Educational Measurement, 21(4), 361-375.zh_TW
dc.relation.reference (參考文獻) Wilcox, R. (1976). A note on the length and passing score of a mastery test. Journal of Educational Statistics, 1, 359-364.zh_TW
dc.relation.reference (參考文獻) Wilcox, R. (1979). Comparing examinees to a control. Psychometrika, 44, 55-68.zh_TW
dc.relation.reference (參考文獻) Wiberg, M.(2003). An optimal design approach to criterion-referenced computerized testing. Journal of Educational and Behavioral Statistics, 28(2), 97-110.zh_TW
dc.relation.reference (參考文獻) Woehr, D. J., Arthur, W., & Fehrman, M. L. (1991). An empirical comparison of cutoff score methods for content-related and criterion-related validity settings. Educational and Psychological Measurement, 51(4), 1029-1039.zh_TW
dc.relation.reference (參考文獻) Zieky, M. J., & Livingston, S. A. (1977). Manual for setting standards on the Basic Skills Assessment Tests. Princeton, NJ: Educational Testing Service.zh_TW
dc.relation.reference (參考文獻) Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG for Windows (version 3). Chicago, IL: Scientific Software International, Inc.zh_TW