學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 變數轉換之離群值偵測
Detection of Outliers with Data Transformation
作者 吳秉勳
Wu, David
貢獻者 鄭宗記
吳秉勳
David Wu
關鍵詞 容離值
最小中位數穩健迴歸估計值
遮蔽效應
最小體積橢圓體估計值
Mahalanobis 距離
分數統計量
鐘乳石圖
步進搜尋演算法
Breakdown Point
Least Median Square (LMS) Estimator
The Masking Effect
Minimum Volume Ellipsoid (MVE) Estimator
Mahalanobis Distance
Score Statistic
Stalactite Plot
The Forward Search Algorithm
日期 2001
上傳時間 15-Apr-2016 16:10:25 (UTC+8)
摘要 在迴歸分析中,當資料中存在很多離群值時,偵測的工作變得非常不容易。 在此狀況下,我們無法使用傳統的殘差分析正確地偵測出其是否存在,此現象稱為遮蔽效應(The Masking Effect)。 而為了避免此效應的發生,我們利用最小中位數穩健迴歸估計值(Least Median Squares Estimator)正確地找出這些群集離群值,此估計值擁有最大即50﹪的容離值 (Breakdown point)。 在這篇論文中,用來求出最小中位數穩健迴歸估計值的演算法稱為步進搜尋演算法 (the Forward Search Algorithm)。 結果顯示,我們可以利用此演算法得到的穩健迴歸估計值,很快並有效率的找出資料中的群集離群值;另外,更進一步的結果顯示,我們只需從資料中隨機選取一百次子集,並進行步進搜尋,即可得到概似的穩健迴歸估計值並正確的找出那些群集離群值。 最後,我們利用鐘乳石圖(Stalactite Plot)列出所有被偵測到的離群值。
Detecting regression outliers is not trivial when there are many of them. The methods of using classical diagnostic plots sometimes fail to detect them. This phenomenon is known as the masking effect. To avoid this, we propose to find out those multiple outliers by using a highly robust regression estimator called the least median squares (LMS) estimator which has maximal breakdown point. The algorithm in search of the LMS estimator is called the forward search algorithm. The estimator found by the forward search is shown to lead to the rapid detection of multiple outliers. Furthermore, the result reveals that 100 repeats of a simple forward search from a random starting subset are shown to provide sufficiently robust parameter estimators to reveal multiple outliers. Finally, those detected outliers are exhibited by the stalactite plot that shows greatly stable pattern of them.
描述 碩士
國立政治大學
統計學系
87354011
資料來源 http://thesis.lib.nccu.edu.tw/record/#A2002001359
資料類型 thesis
dc.contributor.advisor 鄭宗記zh_TW
dc.contributor.author (Authors) 吳秉勳zh_TW
dc.contributor.author (Authors) David Wuen_US
dc.creator (作者) 吳秉勳zh_TW
dc.creator (作者) Wu, Daviden_US
dc.date (日期) 2001en_US
dc.date.accessioned 15-Apr-2016 16:10:25 (UTC+8)-
dc.date.available 15-Apr-2016 16:10:25 (UTC+8)-
dc.date.issued (上傳時間) 15-Apr-2016 16:10:25 (UTC+8)-
dc.identifier (Other Identifiers) A2002001359en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/85146-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 87354011zh_TW
dc.description.abstract (摘要) 在迴歸分析中,當資料中存在很多離群值時,偵測的工作變得非常不容易。 在此狀況下,我們無法使用傳統的殘差分析正確地偵測出其是否存在,此現象稱為遮蔽效應(The Masking Effect)。 而為了避免此效應的發生,我們利用最小中位數穩健迴歸估計值(Least Median Squares Estimator)正確地找出這些群集離群值,此估計值擁有最大即50﹪的容離值 (Breakdown point)。 在這篇論文中,用來求出最小中位數穩健迴歸估計值的演算法稱為步進搜尋演算法 (the Forward Search Algorithm)。 結果顯示,我們可以利用此演算法得到的穩健迴歸估計值,很快並有效率的找出資料中的群集離群值;另外,更進一步的結果顯示,我們只需從資料中隨機選取一百次子集,並進行步進搜尋,即可得到概似的穩健迴歸估計值並正確的找出那些群集離群值。 最後,我們利用鐘乳石圖(Stalactite Plot)列出所有被偵測到的離群值。zh_TW
dc.description.abstract (摘要) Detecting regression outliers is not trivial when there are many of them. The methods of using classical diagnostic plots sometimes fail to detect them. This phenomenon is known as the masking effect. To avoid this, we propose to find out those multiple outliers by using a highly robust regression estimator called the least median squares (LMS) estimator which has maximal breakdown point. The algorithm in search of the LMS estimator is called the forward search algorithm. The estimator found by the forward search is shown to lead to the rapid detection of multiple outliers. Furthermore, the result reveals that 100 repeats of a simple forward search from a random starting subset are shown to provide sufficiently robust parameter estimators to reveal multiple outliers. Finally, those detected outliers are exhibited by the stalactite plot that shows greatly stable pattern of them.en_US
dc.description.tableofcontents 封面頁
證明書
致謝詞
論文摘要
目錄
圖目錄
表目錄
Chapter One Introduction
1.1 Research Motivation
1.2 Research Purposes
1.3 Dissertation Structures
1.4 Literature Review
Chapter Two Forward Search Theory
2.1 Outliers, LMS and MVE Estimators
2.1.1 Leverage Points and Outliers
2.1.2 Least Median Squares (LMS) Estimator
2.1.3 Minimum Volume Ellipsoid (MVE) Estimator
2.2 The Motivation of the Forward Search
2.3 Introduction to the Forward Search Algorithm
2.3.1 General Principles
2.3.2 The Forward Search in Search of LMS Estimator
2.3.3 The Forward Search in search of MVE Estimator
2.4 Stalactite Plots
2.5 Examples
2.5.1 Rousseeuw Data
2.5.2 Hawkins-Bradu-Kass Data
Chapter Three Data Transformations
3.1 Importance of Normality
3.2 Transformations in Regression
3.3 Score Statistic for Transformation
3.3.1 Added Variable Plot
3.3.2 The Derivation of Score Statistic by Added variable
3.4 Examples
3.4.1 Stack Loss Data
Chapter Four Empirical Data Analysis
4.1 Data Illustration and Outlier Detection
4.2 Data Transformation to Improve the Model
Chapter Five Conclusions and Suggestions
5.1 Research Discoveries
5.2 Significance of the Forward Search Algorithm
5.3 Future Study
Appendix
Appendix A Datasets
Appendix B Terminologies
Appendix C Future Study
References
zh_TW
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#A2002001359en_US
dc.subject (關鍵詞) 容離值zh_TW
dc.subject (關鍵詞) 最小中位數穩健迴歸估計值zh_TW
dc.subject (關鍵詞) 遮蔽效應zh_TW
dc.subject (關鍵詞) 最小體積橢圓體估計值zh_TW
dc.subject (關鍵詞) Mahalanobis 距離zh_TW
dc.subject (關鍵詞) 分數統計量zh_TW
dc.subject (關鍵詞) 鐘乳石圖zh_TW
dc.subject (關鍵詞) 步進搜尋演算法zh_TW
dc.subject (關鍵詞) Breakdown Pointen_US
dc.subject (關鍵詞) Least Median Square (LMS) Estimatoren_US
dc.subject (關鍵詞) The Masking Effecten_US
dc.subject (關鍵詞) Minimum Volume Ellipsoid (MVE) Estimatoren_US
dc.subject (關鍵詞) Mahalanobis Distanceen_US
dc.subject (關鍵詞) Score Statisticen_US
dc.subject (關鍵詞) Stalactite Ploten_US
dc.subject (關鍵詞) The Forward Search Algorithmen_US
dc.title (題名) 變數轉換之離群值偵測zh_TW
dc.title (題名) Detection of Outliers with Data Transformationen_US
dc.type (資料類型) thesisen_US