Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 超高維度圖模型估計以及其對判別分析的應用
Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Discriminant Analysis作者 曹卉姍
Tsao, Hui-Shan貢獻者 陳立榜
Chen, Li-Pang
曹卉姍
Tsao, Hui-Shan關鍵詞 提升
變數選取
測量誤差
網路結構
精確矩陣
超高維度資料
Boosting
Feature screening
Measurement error
Network structure
Precision matrix
Ultrahigh-dimensional data日期 2024 上傳時間 5-Aug-2024 14:00:27 (UTC+8) 摘要 圖模型一直都是統計學習中一個熱門的主題,且其對分析高維度資料的 網路結構是很有用的。雖然有許多可以處理複雜結構的方法已經被開發出來, 但是他們大多受限於處理超高維度以及有測量誤差的資料,其中前者反映了變 數維度大於樣本數,而後者則是眾所周知的測量誤差問題。為了能應對這些挑 戰並得出可靠的圖形結構的估計結果,我們開發了一個有效的方法來消除測量 誤差,並應用提升法來同時估計精確矩陣。所提出的方法適用於不同分佈以及 變數間可能的非線性關係。此外,我們的方法可以避免不可微分的懲罰函數並 提供簡單的實施方法。在包含模擬以及實際資料分析的數值研究中,我們發現 所提出的方法可以準確地偵測網路結構,並優於其他現存方法。
Graphical models have been one of popular topics in statistical learning and are useful to analyze the network structure of high-dimensional data. While a large body of estimation methods has been developed to address various complex structures, they are limited to handle ultrahigh-dimensional and error-prone data, where the former reflects that the dimension of variables is larger than the sample size, and the latter is wellknown measurement error problem. To tackle those challenges and derive reliable estimation for the graphical structure, we develop a valid method to eliminate the measurement error effects and apply the boosting procedure to estimate the precision matrix simultaneously. The proposed method is valid to handle various distributions and possibly nonlinear relationship among variables. Moreover, our method avoids non-differentiable penalty function and provides easy implementation. Throughout the numerical studies, including simulation and real data analysis, we find that the proposed method can detect network structure accurately, and outperforms the other existing methods.參考文獻 Brem, R., and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 102, 1572–1577. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579–588. Cai, T. Liu, W., and Luo, X. (2011). A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. Cai, T. Liu, W., and Luo, X. (2011). Package clime: Constrained L1-Minimization for Inverse (Covariance) Matrix Estimation. https://CRAN.R-project.org/package=clime. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. CRC Press Chapman and Hall, Boca Raton. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261–3300. Chen, L.-P. (2021). Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error. Computational Statistics, 36. 857–884. Chen, L.-P. (2022). Network-based discriminant analysis for multiclassification. Journal of Classification, 39. 410–431. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2022). De-noising analysis of noisy data under mixed graphical models. Electronic Journal of Statistics, 16, 3861–3909. Chen, L.-P. (2023a). Estimation of graphical models: An overview of selected topics. International Statistical Review, In press. Chen, L.-P. (2023b). A note of feature screening via a rank-based coefficient of correlation. Biometrical Journal, 65, 2100373. Chen, L.-P. (2023c). Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics, In press. Dalal, O. and Rajaratnam, B. (2017). Sparse Gaussian graphical model estimation via alternating minimization. Biometrika, 104, 379–395. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441. Friedman, J., Hastie, T., and Tibshirani, R. (2019). Package glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. https://CRAN.R-project.org/package=glasso. Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management process, 5, 1–11. Hsieh, C.-J., Matyas A. Sustik, M.A., Dhillon, I.S., and Ravikumar, P. (2014) Package QUIC: Regularized sparse inverse covariance matrix estimation. https://CRAN.Rproject.org/package=QUIC. Jankov´a, J., and van de Geer, S. (2018). Inference in high-dimensional graphical models. In Handbook of Graphical Models Edited By Marloes Maathuis, Mathias Drton, Steffen Lauritzen, Martin Wainwright, 325–349. CRC Press, Boca Raton. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679. Klaassen, S., Kueck, J., and Spindler, M. (2023). Uniform Inference in High-Dimensional Gaussian Graphical Models. Biometrika, 110, 51–68. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Accepted Article Mathematical Statistics, 22, 79–86. Lafferty, J., Liu, H., and Wasserman, L. (2012). Sparse nonparametric graphical models. Statistical Science, 27, 519–537. Li, T., Qian, C., Levina, E., and Zhu, J. (2020). High-dimensional gaussian graphical models on network-linked data. Journal of Machine Learning Research, 21, 1–45. Liang, S. and Liang, F. (2022). A double regression method for graphical modeling of highdimensional nonlinear and non-Gaussian data. Statistics and Its Interface, In press. Lin, L., Drton, M., and Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics, 10, 806–854. Liu, H., Han, F., Yuan, M., Lafferty, J.D., and Wasserman, L.A. (2012). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40, 2293–2326. Liu, H., Lafferty, J.D., and Wasserman, L.A. (2009). The nonparanormal: semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10, 2295–2328. Liu, H. and Zhang, X. (2023). Frequentist model averaging for undirected Gaussian graphical models. Biometrics, 79, 2050–2062. Mazumder, R., and Hastie, T. (2012). Package dpglasso: Primal Graphical Lasso. https://CRAN.Rproject.org/package=dpglasso. Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462. Qiu, H., Han, F., Liu, H., and Caffo, B. (2016) Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78, 487–504. Ravikumar, P., Wainwright, M. J., and Lafferty, J. (2010). High dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319. Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log determinant divergence. Electronic Journal of Statistics, 5, 935–980. Roy, A. and Dunson, D.B. (2020). Nonparametric graphical model for counts. Journal of Machine Learning Research, 21, 1–22. Shi, W., Ghosal, S., and Martin, R. (2021). Bayesian estimation of sparse precision matrices in the presence of Gaussian measurement error. Electronic Journal of Statistics, 15, 4545–4579. Sun, H. and Li, H. (2012). Robust Gaussian graphical modeling via ℓ1-penalization. Biometrics, 68, 1197–1206. Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, Cambridge. Wan, Y.-W., Allen, G. I., Baker, Y., Yang, E., Ravikumar, P., and Liu, Z. (2015). Package XMRF: Markov Random Fields for High-Throughput Genetics Data. https://cran.rproject.org/web/packages/XMRF/. Wang, L., Chen, Z., Wang, C. D., and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation. Journal of Econometrics, 215, 118–130. Wolfson, J. (2011). EEBOOST: a general method for prediction and variables selection based on estimating equation. Journal of the American Statistical Association, 106, 295–305. Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40, 2541–2571. Yang, Y., Dai, H., and Pan, J. (2023). Block-diagonal precision matrix regularization for ultra-high dimensional data. Computational Statistics and Data Analysis, 179, 107630. Yang, Z., Ning, Y., and Liu, H. (2018). On semiparametric exponential family graphical models. Journal of Machine Learning Research, 19, 1–59. Yi, G. Y. (2017). Statistical Analysis with Measurement Error and Misclassification: Strategy, Method and Application. Springer, New York. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35 . Zhao, T., Liu, H., Lafferty, J., and Wasserman, L. (2012). The huge package for highdimensional undirected graph estimation in R. Journal of Machine Learning Research, 13, 1059–1062. Zou, H. (2006) The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429 描述 碩士
國立政治大學
統計學系
111354028資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111354028 資料類型 thesis dc.contributor.advisor 陳立榜 zh_TW dc.contributor.advisor Chen, Li-Pang en_US dc.contributor.author (Authors) 曹卉姍 zh_TW dc.contributor.author (Authors) Tsao, Hui-Shan en_US dc.creator (作者) 曹卉姍 zh_TW dc.creator (作者) Tsao, Hui-Shan en_US dc.date (日期) 2024 en_US dc.date.accessioned 5-Aug-2024 14:00:27 (UTC+8) - dc.date.available 5-Aug-2024 14:00:27 (UTC+8) - dc.date.issued (上傳時間) 5-Aug-2024 14:00:27 (UTC+8) - dc.identifier (Other Identifiers) G0111354028 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152781 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 111354028 zh_TW dc.description.abstract (摘要) 圖模型一直都是統計學習中一個熱門的主題,且其對分析高維度資料的 網路結構是很有用的。雖然有許多可以處理複雜結構的方法已經被開發出來, 但是他們大多受限於處理超高維度以及有測量誤差的資料,其中前者反映了變 數維度大於樣本數,而後者則是眾所周知的測量誤差問題。為了能應對這些挑 戰並得出可靠的圖形結構的估計結果,我們開發了一個有效的方法來消除測量 誤差,並應用提升法來同時估計精確矩陣。所提出的方法適用於不同分佈以及 變數間可能的非線性關係。此外,我們的方法可以避免不可微分的懲罰函數並 提供簡單的實施方法。在包含模擬以及實際資料分析的數值研究中,我們發現 所提出的方法可以準確地偵測網路結構,並優於其他現存方法。 zh_TW dc.description.abstract (摘要) Graphical models have been one of popular topics in statistical learning and are useful to analyze the network structure of high-dimensional data. While a large body of estimation methods has been developed to address various complex structures, they are limited to handle ultrahigh-dimensional and error-prone data, where the former reflects that the dimension of variables is larger than the sample size, and the latter is wellknown measurement error problem. To tackle those challenges and derive reliable estimation for the graphical structure, we develop a valid method to eliminate the measurement error effects and apply the boosting procedure to estimate the precision matrix simultaneously. The proposed method is valid to handle various distributions and possibly nonlinear relationship among variables. Moreover, our method avoids non-differentiable penalty function and provides easy implementation. Throughout the numerical studies, including simulation and real data analysis, we find that the proposed method can detect network structure accurately, and outperforms the other existing methods. en_US dc.description.tableofcontents Chapter 1 Introduction-1 Chapter 2 Notation and Models-4 2.1 Graphical Models-4 2.2 Measurement Error Models-5 Chapter 3 Methodology-6 3.1 Correction of Measurement Error Effects-6 3.2 Feature Screening-7 3.3 Boosting Estimation for Θ-9 3.4 Practical Applications: Discriminant Analysis-10 Chapter 4 Computational Implementation: R Package GUES-13 4.1 boost.graph -13 4.2 LDA.boost -14 Chapter 5 Numerical Studies-15 5.1 Simulation Setup-15 5.2 Simulation Results-17 Chapter 6 Real Data Analysis-20 6.1 Gene Expression Omnibus Data -20 6.2 Small Round Blue Cell Tumors Gene Expression Data -22 Chapter 7 Summary-24 Reference -47 zh_TW dc.format.extent 1200555 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111354028 en_US dc.subject (關鍵詞) 提升 zh_TW dc.subject (關鍵詞) 變數選取 zh_TW dc.subject (關鍵詞) 測量誤差 zh_TW dc.subject (關鍵詞) 網路結構 zh_TW dc.subject (關鍵詞) 精確矩陣 zh_TW dc.subject (關鍵詞) 超高維度資料 zh_TW dc.subject (關鍵詞) Boosting en_US dc.subject (關鍵詞) Feature screening en_US dc.subject (關鍵詞) Measurement error en_US dc.subject (關鍵詞) Network structure en_US dc.subject (關鍵詞) Precision matrix en_US dc.subject (關鍵詞) Ultrahigh-dimensional data en_US dc.title (題名) 超高維度圖模型估計以及其對判別分析的應用 zh_TW dc.title (題名) Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Discriminant Analysis en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Brem, R., and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 102, 1572–1577. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579–588. Cai, T. Liu, W., and Luo, X. (2011). A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. Cai, T. Liu, W., and Luo, X. (2011). Package clime: Constrained L1-Minimization for Inverse (Covariance) Matrix Estimation. https://CRAN.R-project.org/package=clime. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. CRC Press Chapman and Hall, Boca Raton. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261–3300. Chen, L.-P. (2021). Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error. Computational Statistics, 36. 857–884. Chen, L.-P. (2022). Network-based discriminant analysis for multiclassification. Journal of Classification, 39. 410–431. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2022). De-noising analysis of noisy data under mixed graphical models. Electronic Journal of Statistics, 16, 3861–3909. Chen, L.-P. (2023a). Estimation of graphical models: An overview of selected topics. International Statistical Review, In press. Chen, L.-P. (2023b). A note of feature screening via a rank-based coefficient of correlation. Biometrical Journal, 65, 2100373. Chen, L.-P. (2023c). Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics, In press. Dalal, O. and Rajaratnam, B. (2017). Sparse Gaussian graphical model estimation via alternating minimization. Biometrika, 104, 379–395. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441. Friedman, J., Hastie, T., and Tibshirani, R. (2019). Package glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. https://CRAN.R-project.org/package=glasso. Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management process, 5, 1–11. Hsieh, C.-J., Matyas A. Sustik, M.A., Dhillon, I.S., and Ravikumar, P. (2014) Package QUIC: Regularized sparse inverse covariance matrix estimation. https://CRAN.Rproject.org/package=QUIC. Jankov´a, J., and van de Geer, S. (2018). Inference in high-dimensional graphical models. In Handbook of Graphical Models Edited By Marloes Maathuis, Mathias Drton, Steffen Lauritzen, Martin Wainwright, 325–349. CRC Press, Boca Raton. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679. Klaassen, S., Kueck, J., and Spindler, M. (2023). Uniform Inference in High-Dimensional Gaussian Graphical Models. Biometrika, 110, 51–68. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Accepted Article Mathematical Statistics, 22, 79–86. Lafferty, J., Liu, H., and Wasserman, L. (2012). Sparse nonparametric graphical models. Statistical Science, 27, 519–537. Li, T., Qian, C., Levina, E., and Zhu, J. (2020). High-dimensional gaussian graphical models on network-linked data. Journal of Machine Learning Research, 21, 1–45. Liang, S. and Liang, F. (2022). A double regression method for graphical modeling of highdimensional nonlinear and non-Gaussian data. Statistics and Its Interface, In press. Lin, L., Drton, M., and Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics, 10, 806–854. Liu, H., Han, F., Yuan, M., Lafferty, J.D., and Wasserman, L.A. (2012). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40, 2293–2326. Liu, H., Lafferty, J.D., and Wasserman, L.A. (2009). The nonparanormal: semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10, 2295–2328. Liu, H. and Zhang, X. (2023). Frequentist model averaging for undirected Gaussian graphical models. Biometrics, 79, 2050–2062. Mazumder, R., and Hastie, T. (2012). Package dpglasso: Primal Graphical Lasso. https://CRAN.Rproject.org/package=dpglasso. Meinshausen, N. and B¨uhlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462. Qiu, H., Han, F., Liu, H., and Caffo, B. (2016) Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78, 487–504. Ravikumar, P., Wainwright, M. J., and Lafferty, J. (2010). High dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319. Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log determinant divergence. Electronic Journal of Statistics, 5, 935–980. Roy, A. and Dunson, D.B. (2020). Nonparametric graphical model for counts. Journal of Machine Learning Research, 21, 1–22. Shi, W., Ghosal, S., and Martin, R. (2021). Bayesian estimation of sparse precision matrices in the presence of Gaussian measurement error. Electronic Journal of Statistics, 15, 4545–4579. Sun, H. and Li, H. (2012). Robust Gaussian graphical modeling via ℓ1-penalization. Biometrics, 68, 1197–1206. Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, Cambridge. Wan, Y.-W., Allen, G. I., Baker, Y., Yang, E., Ravikumar, P., and Liu, Z. (2015). Package XMRF: Markov Random Fields for High-Throughput Genetics Data. https://cran.rproject.org/web/packages/XMRF/. Wang, L., Chen, Z., Wang, C. D., and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation. Journal of Econometrics, 215, 118–130. Wolfson, J. (2011). EEBOOST: a general method for prediction and variables selection based on estimating equation. Journal of the American Statistical Association, 106, 295–305. Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40, 2541–2571. Yang, Y., Dai, H., and Pan, J. (2023). Block-diagonal precision matrix regularization for ultra-high dimensional data. Computational Statistics and Data Analysis, 179, 107630. Yang, Z., Ning, Y., and Liu, H. (2018). On semiparametric exponential family graphical models. Journal of Machine Learning Research, 19, 1–59. Yi, G. Y. (2017). Statistical Analysis with Measurement Error and Misclassification: Strategy, Method and Application. Springer, New York. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35 . Zhao, T., Liu, H., Lafferty, J., and Wasserman, L. (2012). The huge package for highdimensional undirected graph estimation in R. Journal of Machine Learning Research, 13, 1059–1062. Zou, H. (2006) The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429 zh_TW