學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 運用雲端運算於智慧型健保費用異常偵測之研究
A Research into Intelligent Cloud Computing Techniques for Detecting Anomalous Health-insurance Expenses作者 黃聖尹
Huang, Sheng Yin貢獻者 姜國輝
Chiang, Johannes K.
黃聖尹
Huang, Sheng Yin關鍵詞 健保資料庫
申報異常
支持向量機器
雲端運算
National health insurance database
anomaly claim
Support Vector Machines
Cloud Computing日期 2013 上傳時間 3-十一月-2014 10:09:08 (UTC+8) 摘要 我國健保費用逐漸增長,進而衍生出許多健保問題,其中浮報、虛報及詐欺等三種情況,會造成許多醫療資源的浪費。然而,目前電腦檔案分析只能偵測出浮報、虛報的行為,無法偵測出詐欺情況。對於健保詐欺之偵測只能仰賴傳統隨機抽樣檢驗及人力分析,而我國健保平均一年門診審查申報量約3.5 億件,其人力的負擔非常沉重。故本研究將探討如何利用電腦工具初步判別醫事機構之費用申報情況。 本研究透過大量文獻回顧,發現美國有研究指出結合Benford’s law 與智慧型方法來進行詐欺偵測,可獲得很好的效果(Busta & Weinberg 1998)。Benford’s law 指出許多數據來源皆會呈現特定的數字頻率分佈,近年來Benford’s law 亦被應用在許多不同領域的舞弊或詐欺的審查流程中。 本研究使用Apache Hadoop 及其相關專案,建構出一個大量資料儲存分析之環境,針對大量健保申報費用資料來進行分析。此系統結合了Benford’s law 數字分析方法並運用支持向量機(Support Vector Machine)來對健保費用申報進行大規模電腦初步審查,判別該醫事機構是否有異常申報之情況發生,並將初步判別之結果提供給健保局相關稽查人員,進而做深入的審查。 本研究所建構的智慧型健保費用異常偵測模型結合了Benford’s law 衍生指標變數與實務指標變數,並利用SVM 分析健保申報費用歷史資料,產生出預判模型,之後便可藉由此模型來判別未來健保費用申報資料是否有異常情況發生。在判別異常資料方面,本研究所建構的模型其整體正確率高達97.7995%,且所有的異常申報資料皆可準確地預測出來。 因此,本研究希望能結合Benford’s law 與智慧型運算方法於健保申報異常偵測上,如此一來便可藉由電腦進行初步審查,減少因傳統隨機抽樣調查所造成的不確定性以及審核大量健保資料時過多的人力資源浪費。 參考文獻 [1]Benford, F. "The law of anomalous numbers," Proceedings of the American Philosophical Society (78:4) 1938, pp 551-572.[2]Bolton, R. J., & Hand, D. J. "Statistical fraud detection: A review," Statistical Science) 2002, pp 235-249.[3]Busta, B., & Weinberg, R. "Using Benford’s law and neural networks as a review procedure," Managerial Auditing Journal (13:6) 1998, pp 356-366.[4]Busta, B., & Weinberg, R. "Using Benford’s law and neural networks as a review procedure," Managerial Auditing Journal (13:6) 1998, pp 356-366[5]Carslaw C. (1988), Anomalies in Income Numbers: Evidence of Goal Oriented Behavior, The Accounting review 63(2), pp.321-327.[6]Chan, P. and S. Stolfo, 1998, “Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, In KDD-98, Agrawal, Stolorz, and Piatetsky-Shapiro, Eds.,AAAI Press, pp. 164-168.[7]Christopher J. Skousen, Liming Guan, T. Sterling Wetzel (2004), Anomalies and Unusual Patterns in Reported Earnings: Japanese Managers Round Earnings, Journal of International Financial Management and Accounting 15(3).[8]Cortes, Corinna; and Vapnik, Vladimir N.; "Support-Vector Networks", Machine Learning, 20, 1995[9]Dean, J., and Ghemawat, S., 2008, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, 51(1): p. 107-113.[10]Drake, P. D., & Nigrini, M. J. "Computer assisted analytical procedures using Benford`s Law* 1," Journal of Accounting Education (18:2) 2000, pp 127-146.[11]Efron, B. (1979). Bootstrap Methods:Another Look at the Jackknife. Annals of Statist. 7,1-26.[12]Fawcett, Tom and Foster Provost, 1999, “Activity monitoring: noticing interesting changes in behavior”, In Proc. of KDD-99, pp. 53-62[13]Fayyad, U., G. Piatetsky-Shapiro and P. Smyth, 1996, ”From Data Mining to Knowledge Discovery: An Overview”, Advances in Knowledge Discovery and Data Mining, pp.1-36.[14]Formann, A. K. "The Newcomb-Benford Law in its relation to some common distributions," PLoS One (5:5) 2010, p e10541.[15]Glaser, W. A. Paying the doctor: systems of remuneration and their effects Johns Hopkins Press, Baltimore, 1970.[16]Hal Varian: Benford`s law, American Statistician 26, p.65.[17]HBase官方網站.(2014).Retrieved from http://hadoop.apache.org/hbase/.[18]Hill, T. P. "A statistical derivation of the significant-digit law," Statistical Science) 1995, pp 354-363.[19]Hill, T. P. "A statistical derivation of the significant-digit law," Statistical Science) 1995, pp 354-363[20]Jeffrey Dean and Sanjay Ghemawat, 2008 , “Mapreduce: Simplified data processing on large clusters”, Commun.ACM, vol. 51, no.1, pp. 137–150.[21]Laine, S., & Simila, T. "Using SOM-based data binning to support supervised variable selection," in: Neural Information Processing, N.R. Pal, N. Kasabov, R.K. Mudi, S. Pal and S.K. Parui (eds.), Springer-Verlag Berlin, Berlin, 2004, pp. 172-180.[22]Lu, F., & Boritz, J. E. "Detecting fraud in health insurance data: Learning to model incomplete Benford`s law distributions," in: Machine Learning: Ecml 2005, Proceedings, J. Gama, R. Camacho, P. Brazdil, A. Jorge and L. Torgo (eds.), Springer-Verlag Berlin, Berlin, 2005, pp. 633-640.[23]Lu, F., Boritz, J. E., & Covvey, D. "Adaptive Fraud Detection using Benford`s Law," in: Advances in Artificial Intelligence, Proceedings, L. Lamontagne and M. Marchand (eds.), Springer-Verlag Berlin, Berlin, 2006, pp. 347-358.[24]Nigrini, M. J. "A taxpayer compliance application of Benford`s law," The Journal of the American Taxation Association (18:1) 1996, pp 72-91.[25]Nigrini, M. J., & Mittermaier, L. J. "The use of Benford`s law as an aid in analytical procedures," Auditing (16) 1997, pp 52-67.[26]Paul C. Zikopoulos, B.A., M.B.A.(2012),” Understanding Big Data-Analytics for Enterprise Class Hadoop and Streaming Data”,The McGraw-Hill Companies.[27]psvm.(2014).Retrieved from http://code.google.com/p/psvm/[28]Satnam Alag, 2008,“Collective Intelligence In Action”, Manning Pubns Co, pp 298-299.[29]Simon Newcomb (1881). "Note on the frequency of use of the different digits in natural numbers". American Journal of Mathematics (American Journal of Mathematics, Vol. 4, No. 1) 4 (1/4): 39–40[30]Sparrow, M. K.,1998 , “Fraud Control in the Health Care Industry: Assessing the State of the Art.”, National Institute of Justice:1-12.[31]Steven W. Smith. "The Scientist and Engineer`s Guide to Digital Signal Processing, chapter 34, Explaining Benford`s Law". 2012.[32]Support Vector Machines簡介.(2014).Retrieved from http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM2.pdf[33]Thomas, Jacob K. (1989), Unusual patterns in reported earnings, The Accounting Review 64(4), pp.773-787.[34]White, Tom & Cutting, Doug(2011)。Hadoop: The Definitive Guide. Oreilly & Associates Inc.[35]Wikipedia- Benford`s law(2013), Retrieved from http://en.wikipedia.org/wiki/Benford`s_law[36]中央健康保險局(2012),2012-2013全民健康保險簡介,行政院衛生署中央健康保險局[37]全民健康保險研究資料庫.(2014).Retrieved from http://nhird.nhri.org.tw/index.php#.[38]林弘德.(2007). piaip 的 (lib)SVM 簡易入門.Retrieved from http://ntu.csie.org/~piaip//docs/svm/#[39]國家衛生研究院.(2014).Wikipedia-MapReduce.Retrieved from http://en.wikipedia.org/wiki/MapReduce[40]連賢明(2008),"如何使用健保資料進行經濟研究," 經濟論文叢刊 (36:1), pp 115-143。[41]陳均輔 (2013). "資策會Find網站." from http://www.find.org.tw/find/home.aspx?page=many&id=359.[42]湯玲郎, & 林信忠 "資料萃取法在健保費用稽核之研究," 醫療資訊雜誌 (11) 2000, pp 85-104.[43]雲端運算使用案例討論小組(2010),雲端運算使用案例白皮書,Cloud Computing Use Cases group。[44]楊喻翔(2012),” 運用Benford定律的智慧型健保費用異常偵測模型之研究”, 台灣碩博士論文網。[45]蔡碧展(2010),”基於Hadoop平台的雲端基因架構:,台灣碩博士論文網。[46]鄭守夏.(2011).健保資料庫內容與應用.[47]駱至中, 王鄭慈, 林錦昌, & 戴丁榮 "應用遺傳模糊專家分類系統於健保醫療費用申報異常行為之自動化檢測," 計量管理期刊 (2:1) 2005, pp 15-26[48]謝明瑞(2002),”全民健保的省思”,《國政分析》財經(析)091-014 號。[49]趨勢科技研發實驗室.(2009).Hbase介紹-資料模型與系統架構.[50]藍中賢, & 詹前隆 "結合模糊及合理論與貝氏分類法之資料探勘技術," in: 第十一屆全國資訊管理學術研討, 中山大學, 高雄, 2000 描述 碩士
國立政治大學
資訊管理研究所
100356002
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100356002 資料類型 thesis dc.contributor.advisor 姜國輝 zh_TW dc.contributor.advisor Chiang, Johannes K. en_US dc.contributor.author (作者) 黃聖尹 zh_TW dc.contributor.author (作者) Huang, Sheng Yin en_US dc.creator (作者) 黃聖尹 zh_TW dc.creator (作者) Huang, Sheng Yin en_US dc.date (日期) 2013 en_US dc.date.accessioned 3-十一月-2014 10:09:08 (UTC+8) - dc.date.available 3-十一月-2014 10:09:08 (UTC+8) - dc.date.issued (上傳時間) 3-十一月-2014 10:09:08 (UTC+8) - dc.identifier (其他 識別碼) G0100356002 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/70979 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理研究所 zh_TW dc.description (描述) 100356002 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 我國健保費用逐漸增長,進而衍生出許多健保問題,其中浮報、虛報及詐欺等三種情況,會造成許多醫療資源的浪費。然而,目前電腦檔案分析只能偵測出浮報、虛報的行為,無法偵測出詐欺情況。對於健保詐欺之偵測只能仰賴傳統隨機抽樣檢驗及人力分析,而我國健保平均一年門診審查申報量約3.5 億件,其人力的負擔非常沉重。故本研究將探討如何利用電腦工具初步判別醫事機構之費用申報情況。 本研究透過大量文獻回顧,發現美國有研究指出結合Benford’s law 與智慧型方法來進行詐欺偵測,可獲得很好的效果(Busta & Weinberg 1998)。Benford’s law 指出許多數據來源皆會呈現特定的數字頻率分佈,近年來Benford’s law 亦被應用在許多不同領域的舞弊或詐欺的審查流程中。 本研究使用Apache Hadoop 及其相關專案,建構出一個大量資料儲存分析之環境,針對大量健保申報費用資料來進行分析。此系統結合了Benford’s law 數字分析方法並運用支持向量機(Support Vector Machine)來對健保費用申報進行大規模電腦初步審查,判別該醫事機構是否有異常申報之情況發生,並將初步判別之結果提供給健保局相關稽查人員,進而做深入的審查。 本研究所建構的智慧型健保費用異常偵測模型結合了Benford’s law 衍生指標變數與實務指標變數,並利用SVM 分析健保申報費用歷史資料,產生出預判模型,之後便可藉由此模型來判別未來健保費用申報資料是否有異常情況發生。在判別異常資料方面,本研究所建構的模型其整體正確率高達97.7995%,且所有的異常申報資料皆可準確地預測出來。 因此,本研究希望能結合Benford’s law 與智慧型運算方法於健保申報異常偵測上,如此一來便可藉由電腦進行初步審查,減少因傳統隨機抽樣調查所造成的不確定性以及審核大量健保資料時過多的人力資源浪費。 zh_TW dc.description.tableofcontents 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 2第二章 文獻探討 3 第一節 醫療違規之探討 3 一、 全民健康保險之沿革 3 二、 我國目前健保違規現況 4 三、 詐欺偵測相關議題之探討 5 第二節 Benford’s Law 之研究 7 一、 Benford`s law 7 二、 應用Benford`s law 於數字分析上 8 三、 Benford`s law 指標變數與實務性指標 10 第三節 雲端運算 11 一、 雲端運算之簡介 11 二、 Apache Hadoop 13 三、 Apache HBase 19 四、 Apache Hive 23 第四節 雲端運算與智慧型演算法 26 一、 成長式階層自我組織映射模型(GHSOM) 26 二、 徑向基函數類神經網路(RBFNN) 27 三、 支持向量機(SVM) 28 第五節 抽樣方法與Bootstrapping 30 一、 抽樣方法 30 二、 Bootstrapping 31第三章 研究設計 32 第一節 系統概述 32 第二節 系統架構 33 第三節 系統流程 34 第四節 資料來源 37 第五節 研究範圍與限制 37第四章 系統開發與實作 38 第一節 資料儲存之環境 38 一、 健保原始資料之描述 38 二、 HBase table 之設計 40 三、 Hive 之建構 42 第二節 MapReduce 資料處理方法之實作 42 一、 將原始資料存入HBase 42 二、 計算Benford’s law 指標變數 43 第三節 Parallel SVM 之實作 45 第四節 異常偵測之結果 47 一、 描述性統計 47 二、 異常偵測之結果 47第五章 總結與建議 49 第一節 研究結論 49 第二節 未來研究方向與建議 50Reference 52 zh_TW dc.format.extent 1760199 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100356002 en_US dc.subject (關鍵詞) 健保資料庫 zh_TW dc.subject (關鍵詞) 申報異常 zh_TW dc.subject (關鍵詞) 支持向量機器 zh_TW dc.subject (關鍵詞) 雲端運算 zh_TW dc.subject (關鍵詞) National health insurance database en_US dc.subject (關鍵詞) anomaly claim en_US dc.subject (關鍵詞) Support Vector Machines en_US dc.subject (關鍵詞) Cloud Computing en_US dc.title (題名) 運用雲端運算於智慧型健保費用異常偵測之研究 zh_TW dc.title (題名) A Research into Intelligent Cloud Computing Techniques for Detecting Anomalous Health-insurance Expenses en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1]Benford, F. "The law of anomalous numbers," Proceedings of the American Philosophical Society (78:4) 1938, pp 551-572.[2]Bolton, R. J., & Hand, D. J. "Statistical fraud detection: A review," Statistical Science) 2002, pp 235-249.[3]Busta, B., & Weinberg, R. "Using Benford’s law and neural networks as a review procedure," Managerial Auditing Journal (13:6) 1998, pp 356-366.[4]Busta, B., & Weinberg, R. "Using Benford’s law and neural networks as a review procedure," Managerial Auditing Journal (13:6) 1998, pp 356-366[5]Carslaw C. (1988), Anomalies in Income Numbers: Evidence of Goal Oriented Behavior, The Accounting review 63(2), pp.321-327.[6]Chan, P. and S. Stolfo, 1998, “Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection”, In KDD-98, Agrawal, Stolorz, and Piatetsky-Shapiro, Eds.,AAAI Press, pp. 164-168.[7]Christopher J. Skousen, Liming Guan, T. Sterling Wetzel (2004), Anomalies and Unusual Patterns in Reported Earnings: Japanese Managers Round Earnings, Journal of International Financial Management and Accounting 15(3).[8]Cortes, Corinna; and Vapnik, Vladimir N.; "Support-Vector Networks", Machine Learning, 20, 1995[9]Dean, J., and Ghemawat, S., 2008, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, 51(1): p. 107-113.[10]Drake, P. D., & Nigrini, M. J. "Computer assisted analytical procedures using Benford`s Law* 1," Journal of Accounting Education (18:2) 2000, pp 127-146.[11]Efron, B. (1979). Bootstrap Methods:Another Look at the Jackknife. Annals of Statist. 7,1-26.[12]Fawcett, Tom and Foster Provost, 1999, “Activity monitoring: noticing interesting changes in behavior”, In Proc. of KDD-99, pp. 53-62[13]Fayyad, U., G. Piatetsky-Shapiro and P. Smyth, 1996, ”From Data Mining to Knowledge Discovery: An Overview”, Advances in Knowledge Discovery and Data Mining, pp.1-36.[14]Formann, A. K. "The Newcomb-Benford Law in its relation to some common distributions," PLoS One (5:5) 2010, p e10541.[15]Glaser, W. A. Paying the doctor: systems of remuneration and their effects Johns Hopkins Press, Baltimore, 1970.[16]Hal Varian: Benford`s law, American Statistician 26, p.65.[17]HBase官方網站.(2014).Retrieved from http://hadoop.apache.org/hbase/.[18]Hill, T. P. "A statistical derivation of the significant-digit law," Statistical Science) 1995, pp 354-363.[19]Hill, T. P. "A statistical derivation of the significant-digit law," Statistical Science) 1995, pp 354-363[20]Jeffrey Dean and Sanjay Ghemawat, 2008 , “Mapreduce: Simplified data processing on large clusters”, Commun.ACM, vol. 51, no.1, pp. 137–150.[21]Laine, S., & Simila, T. "Using SOM-based data binning to support supervised variable selection," in: Neural Information Processing, N.R. Pal, N. Kasabov, R.K. Mudi, S. Pal and S.K. Parui (eds.), Springer-Verlag Berlin, Berlin, 2004, pp. 172-180.[22]Lu, F., & Boritz, J. E. "Detecting fraud in health insurance data: Learning to model incomplete Benford`s law distributions," in: Machine Learning: Ecml 2005, Proceedings, J. Gama, R. Camacho, P. Brazdil, A. Jorge and L. Torgo (eds.), Springer-Verlag Berlin, Berlin, 2005, pp. 633-640.[23]Lu, F., Boritz, J. E., & Covvey, D. "Adaptive Fraud Detection using Benford`s Law," in: Advances in Artificial Intelligence, Proceedings, L. Lamontagne and M. Marchand (eds.), Springer-Verlag Berlin, Berlin, 2006, pp. 347-358.[24]Nigrini, M. J. "A taxpayer compliance application of Benford`s law," The Journal of the American Taxation Association (18:1) 1996, pp 72-91.[25]Nigrini, M. J., & Mittermaier, L. J. "The use of Benford`s law as an aid in analytical procedures," Auditing (16) 1997, pp 52-67.[26]Paul C. Zikopoulos, B.A., M.B.A.(2012),” Understanding Big Data-Analytics for Enterprise Class Hadoop and Streaming Data”,The McGraw-Hill Companies.[27]psvm.(2014).Retrieved from http://code.google.com/p/psvm/[28]Satnam Alag, 2008,“Collective Intelligence In Action”, Manning Pubns Co, pp 298-299.[29]Simon Newcomb (1881). "Note on the frequency of use of the different digits in natural numbers". American Journal of Mathematics (American Journal of Mathematics, Vol. 4, No. 1) 4 (1/4): 39–40[30]Sparrow, M. K.,1998 , “Fraud Control in the Health Care Industry: Assessing the State of the Art.”, National Institute of Justice:1-12.[31]Steven W. Smith. "The Scientist and Engineer`s Guide to Digital Signal Processing, chapter 34, Explaining Benford`s Law". 2012.[32]Support Vector Machines簡介.(2014).Retrieved from http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM2.pdf[33]Thomas, Jacob K. (1989), Unusual patterns in reported earnings, The Accounting Review 64(4), pp.773-787.[34]White, Tom & Cutting, Doug(2011)。Hadoop: The Definitive Guide. Oreilly & Associates Inc.[35]Wikipedia- Benford`s law(2013), Retrieved from http://en.wikipedia.org/wiki/Benford`s_law[36]中央健康保險局(2012),2012-2013全民健康保險簡介,行政院衛生署中央健康保險局[37]全民健康保險研究資料庫.(2014).Retrieved from http://nhird.nhri.org.tw/index.php#.[38]林弘德.(2007). piaip 的 (lib)SVM 簡易入門.Retrieved from http://ntu.csie.org/~piaip//docs/svm/#[39]國家衛生研究院.(2014).Wikipedia-MapReduce.Retrieved from http://en.wikipedia.org/wiki/MapReduce[40]連賢明(2008),"如何使用健保資料進行經濟研究," 經濟論文叢刊 (36:1), pp 115-143。[41]陳均輔 (2013). "資策會Find網站." from http://www.find.org.tw/find/home.aspx?page=many&id=359.[42]湯玲郎, & 林信忠 "資料萃取法在健保費用稽核之研究," 醫療資訊雜誌 (11) 2000, pp 85-104.[43]雲端運算使用案例討論小組(2010),雲端運算使用案例白皮書,Cloud Computing Use Cases group。[44]楊喻翔(2012),” 運用Benford定律的智慧型健保費用異常偵測模型之研究”, 台灣碩博士論文網。[45]蔡碧展(2010),”基於Hadoop平台的雲端基因架構:,台灣碩博士論文網。[46]鄭守夏.(2011).健保資料庫內容與應用.[47]駱至中, 王鄭慈, 林錦昌, & 戴丁榮 "應用遺傳模糊專家分類系統於健保醫療費用申報異常行為之自動化檢測," 計量管理期刊 (2:1) 2005, pp 15-26[48]謝明瑞(2002),”全民健保的省思”,《國政分析》財經(析)091-014 號。[49]趨勢科技研發實驗室.(2009).Hbase介紹-資料模型與系統架構.[50]藍中賢, & 詹前隆 "結合模糊及合理論與貝氏分類法之資料探勘技術," in: 第十一屆全國資訊管理學術研討, 中山大學, 高雄, 2000 zh_TW