Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 超高維度圖模型估計以及其對判別分析的應用
Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Discriminant Analysis
作者 曹卉姍
Tsao, Hui-Shan
貢獻者 陳立榜
Chen, Li-Pang
曹卉姍
Tsao, Hui-Shan
關鍵詞 提升
變數選取
測量誤差
網路結構
精確矩陣
超高維度資料
Boosting
Feature screening
Measurement error
Network structure
Precision matrix
Ultrahigh-dimensional data
日期 2024
上傳時間 5-Aug-2024 14:00:27 (UTC+8)
摘要 圖模型一直都是統計學習中一個熱門的主題,且其對分析高維度資料的 網路結構是很有用的。雖然有許多可以處理複雜結構的方法已經被開發出來, 但是他們大多受限於處理超高維度以及有測量誤差的資料,其中前者反映了變 數維度大於樣本數,而後者則是眾所周知的測量誤差問題。為了能應對這些挑 戰並得出可靠的圖形結構的估計結果,我們開發了一個有效的方法來消除測量 誤差,並應用提升法來同時估計精確矩陣。所提出的方法適用於不同分佈以及 變數間可能的非線性關係。此外,我們的方法可以避免不可微分的懲罰函數並 提供簡單的實施方法。在包含模擬以及實際資料分析的數值研究中,我們發現 所提出的方法可以準確地偵測網路結構,並優於其他現存方法。
Graphical models have been one of popular topics in statistical learning and are useful to analyze the network structure of high-dimensional data. While a large body of estimation methods has been developed to address various complex structures, they are limited to handle ultrahigh-dimensional and error-prone data, where the former reflects that the dimension of variables is larger than the sample size, and the latter is wellknown measurement error problem. To tackle those challenges and derive reliable estimation for the graphical structure, we develop a valid method to eliminate the measurement error effects and apply the boosting procedure to estimate the precision matrix simultaneously. The proposed method is valid to handle various distributions and possibly nonlinear relationship among variables. Moreover, our method avoids non-differentiable penalty function and provides easy implementation. Throughout the numerical studies, including simulation and real data analysis, we find that the proposed method can detect network structure accurately, and outperforms the other existing methods.
參考文獻 Brem, R., and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 102, 1572–1577. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579–588. Cai, T. Liu, W., and Luo, X. (2011). A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. Cai, T. Liu, W., and Luo, X. (2011). Package clime: Constrained L1-Minimization for Inverse (Covariance) Matrix Estimation. https://CRAN.R-project.org/package=clime. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. CRC Press Chapman and Hall, Boca Raton. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261–3300. Chen, L.-P. (2021). Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error. Computational Statistics, 36. 857–884. Chen, L.-P. (2022). Network-based discriminant analysis for multiclassification. Journal of Classification, 39. 410–431. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2022). De-noising analysis of noisy data under mixed graphical models. Electronic Journal of Statistics, 16, 3861–3909. Chen, L.-P. (2023a). Estimation of graphical models: An overview of selected topics. International Statistical Review, In press. Chen, L.-P. (2023b). A note of feature screening via a rank-based coefficient of correlation. Biometrical Journal, 65, 2100373. Chen, L.-P. (2023c). Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics, In press. Dalal, O. and Rajaratnam, B. (2017). Sparse Gaussian graphical model estimation via alternating minimization. Biometrika, 104, 379–395. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441. Friedman, J., Hastie, T., and Tibshirani, R. (2019). Package glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. https://CRAN.R-project.org/package=glasso. Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management process, 5, 1–11. Hsieh, C.-J., Matyas A. Sustik, M.A., Dhillon, I.S., and Ravikumar, P. (2014) Package QUIC: Regularized sparse inverse covariance matrix estimation. https://CRAN.Rproject.org/package=QUIC. Jankov´a, J., and van de Geer, S. (2018). Inference in high-dimensional graphical models. In Handbook of Graphical Models Edited By Marloes Maathuis, Mathias Drton, Steffen Lauritzen, Martin Wainwright, 325–349. CRC Press, Boca Raton. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679. Klaassen, S., Kueck, J., and Spindler, M. (2023). Uniform Inference in High-Dimensional Gaussian Graphical Models. Biometrika, 110, 51–68. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Accepted Article Mathematical Statistics, 22, 79–86. Lafferty, J., Liu, H., and Wasserman, L. (2012). Sparse nonparametric graphical models. Statistical Science, 27, 519–537. Li, T., Qian, C., Levina, E., and Zhu, J. (2020). High-dimensional gaussian graphical models on network-linked data. Journal of Machine Learning Research, 21, 1–45. Liang, S. and Liang, F. (2022). A double regression method for graphical modeling of highdimensional nonlinear and non-Gaussian data. Statistics and Its Interface, In press. Lin, L., Drton, M., and Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics, 10, 806–854. Liu, H., Han, F., Yuan, M., Lafferty, J.D., and Wasserman, L.A. (2012). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40, 2293–2326. Liu, H., Lafferty, J.D., and Wasserman, L.A. (2009). The nonparanormal: semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10, 2295–2328. Liu, H. and Zhang, X. (2023). Frequentist model averaging for undirected Gaussian graphical models. Biometrics, 79, 2050–2062. Mazumder, R., and Hastie, T. (2012). Package dpglasso: Primal Graphical Lasso. https://CRAN.Rproject.org/package=dpglasso. Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462. Qiu, H., Han, F., Liu, H., and Caffo, B. (2016) Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78, 487–504. Ravikumar, P., Wainwright, M. J., and Lafferty, J. (2010). High dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319. Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log determinant divergence. Electronic Journal of Statistics, 5, 935–980. Roy, A. and Dunson, D.B. (2020). Nonparametric graphical model for counts. Journal of Machine Learning Research, 21, 1–22. Shi, W., Ghosal, S., and Martin, R. (2021). Bayesian estimation of sparse precision matrices in the presence of Gaussian measurement error. Electronic Journal of Statistics, 15, 4545–4579. Sun, H. and Li, H. (2012). Robust Gaussian graphical modeling via ℓ1-penalization. Biometrics, 68, 1197–1206. Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, Cambridge. Wan, Y.-W., Allen, G. I., Baker, Y., Yang, E., Ravikumar, P., and Liu, Z. (2015). Package XMRF: Markov Random Fields for High-Throughput Genetics Data. https://cran.rproject.org/web/packages/XMRF/. Wang, L., Chen, Z., Wang, C. D., and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation. Journal of Econometrics, 215, 118–130. Wolfson, J. (2011). EEBOOST: a general method for prediction and variables selection based on estimating equation. Journal of the American Statistical Association, 106, 295–305. Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40, 2541–2571. Yang, Y., Dai, H., and Pan, J. (2023). Block-diagonal precision matrix regularization for ultra-high dimensional data. Computational Statistics and Data Analysis, 179, 107630. Yang, Z., Ning, Y., and Liu, H. (2018). On semiparametric exponential family graphical models. Journal of Machine Learning Research, 19, 1–59. Yi, G. Y. (2017). Statistical Analysis with Measurement Error and Misclassification: Strategy, Method and Application. Springer, New York. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35 . Zhao, T., Liu, H., Lafferty, J., and Wasserman, L. (2012). The huge package for highdimensional undirected graph estimation in R. Journal of Machine Learning Research, 13, 1059–1062. Zou, H. (2006) The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429
描述 碩士
國立政治大學
統計學系
111354028
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111354028
資料類型 thesis
dc.contributor.advisor 陳立榜zh_TW
dc.contributor.advisor Chen, Li-Pangen_US
dc.contributor.author (Authors) 曹卉姍zh_TW
dc.contributor.author (Authors) Tsao, Hui-Shanen_US
dc.creator (作者) 曹卉姍zh_TW
dc.creator (作者) Tsao, Hui-Shanen_US
dc.date (日期) 2024en_US
dc.date.accessioned 5-Aug-2024 14:00:27 (UTC+8)-
dc.date.available 5-Aug-2024 14:00:27 (UTC+8)-
dc.date.issued (上傳時間) 5-Aug-2024 14:00:27 (UTC+8)-
dc.identifier (Other Identifiers) G0111354028en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152781-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 111354028zh_TW
dc.description.abstract (摘要) 圖模型一直都是統計學習中一個熱門的主題,且其對分析高維度資料的 網路結構是很有用的。雖然有許多可以處理複雜結構的方法已經被開發出來, 但是他們大多受限於處理超高維度以及有測量誤差的資料,其中前者反映了變 數維度大於樣本數,而後者則是眾所周知的測量誤差問題。為了能應對這些挑 戰並得出可靠的圖形結構的估計結果,我們開發了一個有效的方法來消除測量 誤差,並應用提升法來同時估計精確矩陣。所提出的方法適用於不同分佈以及 變數間可能的非線性關係。此外,我們的方法可以避免不可微分的懲罰函數並 提供簡單的實施方法。在包含模擬以及實際資料分析的數值研究中,我們發現 所提出的方法可以準確地偵測網路結構,並優於其他現存方法。zh_TW
dc.description.abstract (摘要) Graphical models have been one of popular topics in statistical learning and are useful to analyze the network structure of high-dimensional data. While a large body of estimation methods has been developed to address various complex structures, they are limited to handle ultrahigh-dimensional and error-prone data, where the former reflects that the dimension of variables is larger than the sample size, and the latter is wellknown measurement error problem. To tackle those challenges and derive reliable estimation for the graphical structure, we develop a valid method to eliminate the measurement error effects and apply the boosting procedure to estimate the precision matrix simultaneously. The proposed method is valid to handle various distributions and possibly nonlinear relationship among variables. Moreover, our method avoids non-differentiable penalty function and provides easy implementation. Throughout the numerical studies, including simulation and real data analysis, we find that the proposed method can detect network structure accurately, and outperforms the other existing methods.en_US
dc.description.tableofcontents Chapter 1 Introduction-1 Chapter 2 Notation and Models-4 2.1 Graphical Models-4 2.2 Measurement Error Models-5 Chapter 3 Methodology-6 3.1 Correction of Measurement Error Effects-6 3.2 Feature Screening-7 3.3 Boosting Estimation for Θ-9 3.4 Practical Applications: Discriminant Analysis-10 Chapter 4 Computational Implementation: R Package GUES-13 4.1 boost.graph -13 4.2 LDA.boost -14 Chapter 5 Numerical Studies-15 5.1 Simulation Setup-15 5.2 Simulation Results-17 Chapter 6 Real Data Analysis-20 6.1 Gene Expression Omnibus Data -20 6.2 Small Round Blue Cell Tumors Gene Expression Data -22 Chapter 7 Summary-24 Reference -47zh_TW
dc.format.extent 1200555 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111354028en_US
dc.subject (關鍵詞) 提升zh_TW
dc.subject (關鍵詞) 變數選取zh_TW
dc.subject (關鍵詞) 測量誤差zh_TW
dc.subject (關鍵詞) 網路結構zh_TW
dc.subject (關鍵詞) 精確矩陣zh_TW
dc.subject (關鍵詞) 超高維度資料zh_TW
dc.subject (關鍵詞) Boostingen_US
dc.subject (關鍵詞) Feature screeningen_US
dc.subject (關鍵詞) Measurement erroren_US
dc.subject (關鍵詞) Network structureen_US
dc.subject (關鍵詞) Precision matrixen_US
dc.subject (關鍵詞) Ultrahigh-dimensional dataen_US
dc.title (題名) 超高維度圖模型估計以及其對判別分析的應用zh_TW
dc.title (題名) Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Discriminant Analysisen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Brem, R., and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 102, 1572–1577. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579–588. Cai, T. Liu, W., and Luo, X. (2011). A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. Cai, T. Liu, W., and Luo, X. (2011). Package clime: Constrained L1-Minimization for Inverse (Covariance) Matrix Estimation. https://CRAN.R-project.org/package=clime. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. CRC Press Chapman and Hall, Boca Raton. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261–3300. Chen, L.-P. (2021). Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error. Computational Statistics, 36. 857–884. Chen, L.-P. (2022). Network-based discriminant analysis for multiclassification. Journal of Classification, 39. 410–431. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2022). De-noising analysis of noisy data under mixed graphical models. Electronic Journal of Statistics, 16, 3861–3909. Chen, L.-P. (2023a). Estimation of graphical models: An overview of selected topics. International Statistical Review, In press. Chen, L.-P. (2023b). A note of feature screening via a rank-based coefficient of correlation. Biometrical Journal, 65, 2100373. Chen, L.-P. (2023c). Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics, In press. Dalal, O. and Rajaratnam, B. (2017). Sparse Gaussian graphical model estimation via alternating minimization. Biometrika, 104, 379–395. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441. Friedman, J., Hastie, T., and Tibshirani, R. (2019). Package glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. https://CRAN.R-project.org/package=glasso. Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management process, 5, 1–11. Hsieh, C.-J., Matyas A. Sustik, M.A., Dhillon, I.S., and Ravikumar, P. (2014) Package QUIC: Regularized sparse inverse covariance matrix estimation. https://CRAN.Rproject.org/package=QUIC. Jankov´a, J., and van de Geer, S. (2018). Inference in high-dimensional graphical models. In Handbook of Graphical Models Edited By Marloes Maathuis, Mathias Drton, Steffen Lauritzen, Martin Wainwright, 325–349. CRC Press, Boca Raton. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679. Klaassen, S., Kueck, J., and Spindler, M. (2023). Uniform Inference in High-Dimensional Gaussian Graphical Models. Biometrika, 110, 51–68. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Accepted Article Mathematical Statistics, 22, 79–86. Lafferty, J., Liu, H., and Wasserman, L. (2012). Sparse nonparametric graphical models. Statistical Science, 27, 519–537. Li, T., Qian, C., Levina, E., and Zhu, J. (2020). High-dimensional gaussian graphical models on network-linked data. Journal of Machine Learning Research, 21, 1–45. Liang, S. and Liang, F. (2022). A double regression method for graphical modeling of highdimensional nonlinear and non-Gaussian data. Statistics and Its Interface, In press. Lin, L., Drton, M., and Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics, 10, 806–854. Liu, H., Han, F., Yuan, M., Lafferty, J.D., and Wasserman, L.A. (2012). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40, 2293–2326. Liu, H., Lafferty, J.D., and Wasserman, L.A. (2009). The nonparanormal: semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10, 2295–2328. Liu, H. and Zhang, X. (2023). Frequentist model averaging for undirected Gaussian graphical models. Biometrics, 79, 2050–2062. Mazumder, R., and Hastie, T. (2012). Package dpglasso: Primal Graphical Lasso. https://CRAN.Rproject.org/package=dpglasso. Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462. Qiu, H., Han, F., Liu, H., and Caffo, B. (2016) Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78, 487–504. Ravikumar, P., Wainwright, M. J., and Lafferty, J. (2010). High dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319. Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log determinant divergence. Electronic Journal of Statistics, 5, 935–980. Roy, A. and Dunson, D.B. (2020). Nonparametric graphical model for counts. Journal of Machine Learning Research, 21, 1–22. Shi, W., Ghosal, S., and Martin, R. (2021). Bayesian estimation of sparse precision matrices in the presence of Gaussian measurement error. Electronic Journal of Statistics, 15, 4545–4579. Sun, H. and Li, H. (2012). Robust Gaussian graphical modeling via ℓ1-penalization. Biometrics, 68, 1197–1206. Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, Cambridge. Wan, Y.-W., Allen, G. I., Baker, Y., Yang, E., Ravikumar, P., and Liu, Z. (2015). Package XMRF: Markov Random Fields for High-Throughput Genetics Data. https://cran.rproject.org/web/packages/XMRF/. Wang, L., Chen, Z., Wang, C. D., and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation. Journal of Econometrics, 215, 118–130. Wolfson, J. (2011). EEBOOST: a general method for prediction and variables selection based on estimating equation. Journal of the American Statistical Association, 106, 295–305. Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40, 2541–2571. Yang, Y., Dai, H., and Pan, J. (2023). Block-diagonal precision matrix regularization for ultra-high dimensional data. Computational Statistics and Data Analysis, 179, 107630. Yang, Z., Ning, Y., and Liu, H. (2018). On semiparametric exponential family graphical models. Journal of Machine Learning Research, 19, 1–59. Yi, G. Y. (2017). Statistical Analysis with Measurement Error and Misclassification: Strategy, Method and Application. Springer, New York. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35 . Zhao, T., Liu, H., Lafferty, J., and Wasserman, L. (2012). The huge package for highdimensional undirected graph estimation in R. Journal of Machine Learning Research, 13, 1059–1062. Zou, H. (2006) The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429zh_TW