學術產出-學位論文

題名 葛特曼量表之拒答插補研究
作者 左宗光
貢獻者 江振東
左宗光
關鍵詞 拒答
葛特曼量表
簡易插補
多重插補
最鄰近插補
日期 2008
上傳時間 8-十二月-2010 14:52:55 (UTC+8)
摘要 在抽樣調查的資料中,可能因為題意不清、關係到個人隱私,或是議題太過於敏感而導致受訪者「拒答」。透過存在拒答的樣本資料來做分析探討時,很可能會造成偏誤的研究結果,因此如何處理無反應的資料常常是一項研究結果是否可信的重要關鍵之一。常見的處理方式通常是設法對這些拒答資料進行插補。然而插補的好壞一直沒有一個判定準則,分析結果亦常因此受到質疑。
本研究將針對葛特曼量表的資料型態,利用「正確率」的概念,用不同的插補方式,包括社會科學研究常使用的簡易插補法,以及多重插補法與最鄰近插補法等方法,透過計算正確率來比較插補的好壞以及推論適用的時機。本研究以「台灣社會變遷基本調查」第四期第三次的調查資料中,有關性態度的題目做為例子,將其中符合葛特曼量表的資料視為「黃金標準」,並按照其中拒答部分的形態,從黃金標準中製造拒答資料。隨著拒答率的上升,每種拒答形態對應的個數將等量放大。
研究結果發現,簡易插補法的正確率可以透過公式推導求出。在這筆資料之下,不論何種簡易插補方法,其正確率都不超過32%,但隨著拒答型態與社會開放程度的不同,拒答率會有很大的變化。多重插補法之下的結果比簡易插補法略好一些,有接近33%的正確率,但從便利性來看使用簡易插補法就比多重插補法來的高。最鄰近插補法的正確率是相對比較高的,最高可以達到約47%,然而執行上比較花費時間,以及正確率有隨著拒答率的上升而下降的趨勢都是最鄰近插補法可能的問題。
In a questionnaire survey、respondents may refuse to answer certain items since the questions themselves are unclear、sensitive、or relating to personal privacy. An analysis result using a data set containing refusal responses might be biased、how to deal with survey refusals have thus drawn much attention of late. One popular approach is through the use of imputation. However、lacking a criterion to evaluate its performances、there exist debates concerning the usefulness of this approach.
In this study、we compare Simple imputation Method、Multiple Imputation Method、and Nearest Neighbor Method to deal with refusals in a set of survey items forming a Gittman scale in terms of imputation accuracy. Data are taken from the 2002 Taiwan Social Change Survey (TSCS)、and the items of interest are about sexual attitude. The parts of data that satisfy perfect Guttman scale are treat as 「Gold Standard」、and refusals are generated according to the original refusal pattern appear in the data.
The result shows that the accuracy associated with Simple Imputation can actually be derived theoretically. No matter which version of Simple Imputation is applied、the accuracy is no more than 32%. Multiple Imputations performs slightly better than Simple Imputation、the accuracy is about 33%. However、it is less efficient in terms of computer time. Although Nearest Neighbor Method has the best performance the three、and its accuracy can reach as 47%、it requires much more computer time than the other two methods、and the accuracy would decrease as the refusal rate goes up.
參考文獻 (一)中文部分
陳信木、林佳瑩(1997)〈調查資料之遺漏值的處置—以熱卡插補法為例〉,《調查
研究》,3:75-106
(二)英文部分
Buuren、S.V. and Oudshoorn、C.G.M.、(2000). Multivariate Imputation by Chained
Equations: MICE V1.0 User’s Manual. Report PG/VGZ/00.038、TNO
Prevention and Health、Leiden.
Cover、T.M. and Hart、P.E.、(1967). 「Nearest Neighbor Pattern Classification」. IEEE
Transactions on Information Theory、13:21-27.
Fix、E. and Hodges、J.L.、(1951). 「Discriminatory analysis-Nonparametric
Discrimination: Consistency Properties」. Project 21-49-004、Report NO.4、US
Air Force School of Aviation Medicine、Randolph Field.
Guttman、L.、(1950). 「The Basis for Scalogram Analysis」(With Stouffer et al).
Measurement and Prediction. Studies in Social Psychology in World War II、
Princeton University Press、NJ、4:60-90.
Kaufman、L.、and Rousseeuw、P.J.、(1990). Finding Groups in Data: An Introduction
to Cluster Analysis. New York: John Wiley and Sons、Inc.
Liao、P.、and Tu、S.、(2006). 「Examining the Scalability of Intimacy Permissiveness
Scale in Taiwan」. Social Indicators Research、76:207-232.
Little、R.J.A.、and Rubin、D.B.、(1989). 「The Analysis of Social Science Data with
Missing Values」. Sociological Methods and Research、18: 292-326.
Menzel、H. (1953). 「A New Coefficient for Scalogram Analysis」. Public Opinion
Quarterly、17: 268-280.
Rubin、D.B.、(1976). Inference and missing data. Biometrika、63:581-592.
Rubin、D.B.、(1987). Multiple Imputation for Nonresponse in Surveys. New York:
John Wiley.
Schafer、J.L (1999)、「Multiple Imputation: A Primer」. Statistical Methods in Medical
Research 8: 3-15.
Shoemaker、P.F.、Eichholz、M.、and Skewes、E.A.、(2002). 「Item Nonresponse:
Distinguishing Between Don’t Know and Refuse」. International Journal of
Public Opinion Research、14: 193-201.
Sinharay、S.、Stern、H.S.、and Russell、D. (2001). 「The Use of Multiple Imputation for
the Analysis of Missing Data」. Psychological Methods 4: 317-329.
Tanner、M.A. and Wong、W.H.、(1987). 「The Calculation of Posterior Distributions by
Data Augmentation (with Discussion)」. Journal of the American Statistical
Association、82: 528-50.
Yamaguchi、K. (2000). 「Multinomial Logit Latent-Class Regression Models: An
Analysis of the Predictors of Gender-Role Attitudes Among Japanese Women」.
American Journal of Sociology、105: 1702-1740.
描述 碩士
國立政治大學
統計研究所
96354013
97
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096354013
資料類型 thesis
dc.contributor.advisor 江振東zh_TW
dc.contributor.author (作者) 左宗光zh_TW
dc.creator (作者) 左宗光zh_TW
dc.date (日期) 2008en_US
dc.date.accessioned 8-十二月-2010 14:52:55 (UTC+8)-
dc.date.available 8-十二月-2010 14:52:55 (UTC+8)-
dc.date.issued (上傳時間) 8-十二月-2010 14:52:55 (UTC+8)-
dc.identifier (其他 識別碼) G0096354013en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/49599-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 96354013zh_TW
dc.description (描述) 97zh_TW
dc.description.abstract (摘要) 在抽樣調查的資料中,可能因為題意不清、關係到個人隱私,或是議題太過於敏感而導致受訪者「拒答」。透過存在拒答的樣本資料來做分析探討時,很可能會造成偏誤的研究結果,因此如何處理無反應的資料常常是一項研究結果是否可信的重要關鍵之一。常見的處理方式通常是設法對這些拒答資料進行插補。然而插補的好壞一直沒有一個判定準則,分析結果亦常因此受到質疑。
本研究將針對葛特曼量表的資料型態,利用「正確率」的概念,用不同的插補方式,包括社會科學研究常使用的簡易插補法,以及多重插補法與最鄰近插補法等方法,透過計算正確率來比較插補的好壞以及推論適用的時機。本研究以「台灣社會變遷基本調查」第四期第三次的調查資料中,有關性態度的題目做為例子,將其中符合葛特曼量表的資料視為「黃金標準」,並按照其中拒答部分的形態,從黃金標準中製造拒答資料。隨著拒答率的上升,每種拒答形態對應的個數將等量放大。
研究結果發現,簡易插補法的正確率可以透過公式推導求出。在這筆資料之下,不論何種簡易插補方法,其正確率都不超過32%,但隨著拒答型態與社會開放程度的不同,拒答率會有很大的變化。多重插補法之下的結果比簡易插補法略好一些,有接近33%的正確率,但從便利性來看使用簡易插補法就比多重插補法來的高。最鄰近插補法的正確率是相對比較高的,最高可以達到約47%,然而執行上比較花費時間,以及正確率有隨著拒答率的上升而下降的趨勢都是最鄰近插補法可能的問題。
zh_TW
dc.description.abstract (摘要) In a questionnaire survey、respondents may refuse to answer certain items since the questions themselves are unclear、sensitive、or relating to personal privacy. An analysis result using a data set containing refusal responses might be biased、how to deal with survey refusals have thus drawn much attention of late. One popular approach is through the use of imputation. However、lacking a criterion to evaluate its performances、there exist debates concerning the usefulness of this approach.
In this study、we compare Simple imputation Method、Multiple Imputation Method、and Nearest Neighbor Method to deal with refusals in a set of survey items forming a Gittman scale in terms of imputation accuracy. Data are taken from the 2002 Taiwan Social Change Survey (TSCS)、and the items of interest are about sexual attitude. The parts of data that satisfy perfect Guttman scale are treat as 「Gold Standard」、and refusals are generated according to the original refusal pattern appear in the data.
The result shows that the accuracy associated with Simple Imputation can actually be derived theoretically. No matter which version of Simple Imputation is applied、the accuracy is no more than 32%. Multiple Imputations performs slightly better than Simple Imputation、the accuracy is about 33%. However、it is less efficient in terms of computer time. Although Nearest Neighbor Method has the best performance the three、and its accuracy can reach as 47%、it requires much more computer time than the other two methods、and the accuracy would decrease as the refusal rate goes up.
en_US
dc.description.tableofcontents 第壹章 研究背景與動機.........................................................................1
第貳章 文獻探討.....................................................................................3
第一節 葛特曼量表模型.........................................................................................3
第二節 過去相關研究............................................................................................4
第三節 遺漏值的定義與機制................................................................................4
第四節 多重插補法................................................................................................5
第五節 最鄰近插補法.............................................................................................7
第參章 資料分析...................................................................................11
第一節 變數介紹..................................................................................................11
第二節 反應變數說明..........................................................................................11
第三節 人口變項分析..........................................................................................14
第肆章 實證分析...................................................................................17
第一節 簡易插補法..............................................................................................19
第二節 多重插補法..............................................................................................36
第三節 最鄰近插補法..........................................................................................38
第伍章 結論與改進事項.......................................................................41
第一節 結論..........................................................................................................41
第二節 改進事項..................................................................................................42
參考文獻..................................................................................................44
附錄..........................................................................................................47
附錄一、簡易插補法(不考慮葛特曼量表性質)....................................................47
附錄二、R製造拒答的程式碼...............................................................................49
附錄三、R執行多重插補的程式碼.......................................................................51
附錄四、R執行最鄰近插補法的程式碼...............................................................60
附錄五、其他模式的多重插補法...........................................................................64
表目錄
表2.1 執行多重插補法的建議分配............................................................................7
表2.2 五筆資料距離計算結果....................................................................................9
表3.1 不含拒答類型資料比例..................................................................................12
表3.2 錯誤個數計算..................................................................................................13
表3.3 各問題同意與不同意個數計算......................................................................13
表3.4 葛特曼量表指標計算結果..............................................................................13
表 3.5 性別的次數分配..............................................................................................14
表 3.6 婚姻狀況的次數分配......................................................................................15
表 3.7 教育程度的次數分配......................................................................................15
表3.8 教育年數的次數分配表..................................................................................16
表 3.9 每月平均收入的次數分配..............................................................................16
表4.1 拒答型態對應可能的回答形式(原始資料)....................................................18
表4.2 拒答率為10%之下,各種回答須對應產生的個數.........................................18
表4.3 正確率計算(原始資料)....................................................................................22
表4.4 拒答型態對應可能的回答形式(當問題數為二時).......................................22
表4.5 問題數為二,出現拒答時建議的簡易插補方法...........................................27
表4.6 拒答型態對應可能的回答形式(當問題數為三時).......................................28
表4.7 問題數為三,出現拒答時建議的簡易插補方法...........................................32
表4.8 問題數為四,出現拒答時建議的簡易插補方法...........................................35
表4.9 模式一多重插補法之下的常用統計量比較..................................................37
表4.10 四種多重插補法的正確率比較....................................................................37
表4.11 最鄰近插補法之下的常用統計量比較.........................................................39
附錄表1 正確率計算(不考慮葛特曼量表特性).......................................................46
附錄表2 模式二多重插補法之下的常用統計量比較.............................................56
附錄表3 模式三多重插補法之下的常用統計量比較.............................................57
附錄表4 模式四多重插補法之下的常用統計量比較.............................................58

圖目錄
圖2.1 多重插補法圖示.................................................................................................6
圖 3.1 性別的次數分配..............................................................................................14
圖3.2 年齡的次數分配圖..........................................................................................14
圖 3.3 婚姻狀況的次數分配......................................................................................15
圖3.4 教育程度的次數分配圖..................................................................................15
圖3.5 教育年數的次數分配圖..................................................................................16
圖3.6 收入的次數分配圖..........................................................................................16
zh_TW
dc.format.extent 90831 bytes-
dc.format.extent 168321 bytes-
dc.format.extent 132865 bytes-
dc.format.extent 120907 bytes-
dc.format.extent 428661 bytes-
dc.format.extent 124630 bytes-
dc.format.extent 173039 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096354013en_US
dc.subject (關鍵詞) 拒答zh_TW
dc.subject (關鍵詞) 葛特曼量表zh_TW
dc.subject (關鍵詞) 簡易插補zh_TW
dc.subject (關鍵詞) 多重插補zh_TW
dc.subject (關鍵詞) 最鄰近插補zh_TW
dc.title (題名) 葛特曼量表之拒答插補研究zh_TW
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) (一)中文部分zh_TW
dc.relation.reference (參考文獻) 陳信木、林佳瑩(1997)〈調查資料之遺漏值的處置—以熱卡插補法為例〉,《調查zh_TW
dc.relation.reference (參考文獻) 研究》,3:75-106zh_TW
dc.relation.reference (參考文獻) (二)英文部分zh_TW
dc.relation.reference (參考文獻) Buuren、S.V. and Oudshoorn、C.G.M.、(2000). Multivariate Imputation by Chainedzh_TW
dc.relation.reference (參考文獻) Equations: MICE V1.0 User’s Manual. Report PG/VGZ/00.038、TNOzh_TW
dc.relation.reference (參考文獻) Prevention and Health、Leiden.zh_TW
dc.relation.reference (參考文獻) Cover、T.M. and Hart、P.E.、(1967). 「Nearest Neighbor Pattern Classification」. IEEEzh_TW
dc.relation.reference (參考文獻) Transactions on Information Theory、13:21-27.zh_TW
dc.relation.reference (參考文獻) Fix、E. and Hodges、J.L.、(1951). 「Discriminatory analysis-Nonparametriczh_TW
dc.relation.reference (參考文獻) Discrimination: Consistency Properties」. Project 21-49-004、Report NO.4、USzh_TW
dc.relation.reference (參考文獻) Air Force School of Aviation Medicine、Randolph Field.zh_TW
dc.relation.reference (參考文獻) Guttman、L.、(1950). 「The Basis for Scalogram Analysis」(With Stouffer et al).zh_TW
dc.relation.reference (參考文獻) Measurement and Prediction. Studies in Social Psychology in World War II、zh_TW
dc.relation.reference (參考文獻) Princeton University Press、NJ、4:60-90.zh_TW
dc.relation.reference (參考文獻) Kaufman、L.、and Rousseeuw、P.J.、(1990). Finding Groups in Data: An Introductionzh_TW
dc.relation.reference (參考文獻) to Cluster Analysis. New York: John Wiley and Sons、Inc.zh_TW
dc.relation.reference (參考文獻) Liao、P.、and Tu、S.、(2006). 「Examining the Scalability of Intimacy Permissivenesszh_TW
dc.relation.reference (參考文獻) Scale in Taiwan」. Social Indicators Research、76:207-232.zh_TW
dc.relation.reference (參考文獻) Little、R.J.A.、and Rubin、D.B.、(1989). 「The Analysis of Social Science Data withzh_TW
dc.relation.reference (參考文獻) Missing Values」. Sociological Methods and Research、18: 292-326.zh_TW
dc.relation.reference (參考文獻) Menzel、H. (1953). 「A New Coefficient for Scalogram Analysis」. Public Opinionzh_TW
dc.relation.reference (參考文獻) Quarterly、17: 268-280.zh_TW
dc.relation.reference (參考文獻) Rubin、D.B.、(1976). Inference and missing data. Biometrika、63:581-592.zh_TW
dc.relation.reference (參考文獻) Rubin、D.B.、(1987). Multiple Imputation for Nonresponse in Surveys. New York:zh_TW
dc.relation.reference (參考文獻) John Wiley.zh_TW
dc.relation.reference (參考文獻) Schafer、J.L (1999)、「Multiple Imputation: A Primer」. Statistical Methods in Medicalzh_TW
dc.relation.reference (參考文獻) Research 8: 3-15.zh_TW
dc.relation.reference (參考文獻) Shoemaker、P.F.、Eichholz、M.、and Skewes、E.A.、(2002). 「Item Nonresponse:zh_TW
dc.relation.reference (參考文獻) Distinguishing Between Don’t Know and Refuse」. International Journal ofzh_TW
dc.relation.reference (參考文獻) Public Opinion Research、14: 193-201.zh_TW
dc.relation.reference (參考文獻) Sinharay、S.、Stern、H.S.、and Russell、D. (2001). 「The Use of Multiple Imputation forzh_TW
dc.relation.reference (參考文獻) the Analysis of Missing Data」. Psychological Methods 4: 317-329.zh_TW
dc.relation.reference (參考文獻) Tanner、M.A. and Wong、W.H.、(1987). 「The Calculation of Posterior Distributions byzh_TW
dc.relation.reference (參考文獻) Data Augmentation (with Discussion)」. Journal of the American Statisticalzh_TW
dc.relation.reference (參考文獻) Association、82: 528-50.zh_TW
dc.relation.reference (參考文獻) Yamaguchi、K. (2000). 「Multinomial Logit Latent-Class Regression Models: Anzh_TW
dc.relation.reference (參考文獻) Analysis of the Predictors of Gender-Role Attitudes Among Japanese Women」.zh_TW
dc.relation.reference (參考文獻) American Journal of Sociology、105: 1702-1740.zh_TW