學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 大英線上圖書館與倫敦大學體系線上圖書館上架編碼的數位考古
The digit archeology about Listing code of online British Library and University of London
作者 陳以洵
Chen, Yi-Hsun
貢獻者 曾正男
Tzeng,Jeng-Nan
陳以洵
Chen, Yi-Hsun
關鍵詞 論文比對
網路爬蟲
錯排問題
隨機抽樣
排序一致性
Z分數
箱型圖法
Thesis Comparison
Web Scraping
Derangements
Random Sampling
Sequential Consistency
Z-Score
Box Plot Method
日期 2023
上傳時間 2-Aug-2023 13:01:59 (UTC+8)
摘要 由於申請國外學位論文證明的時間成本較高,本論文目標為利用公開網路資訊,來建制出一套學位論文離群程度的初階篩選,我們以Python Selenium及BeautifulSoup針對大英線上圖書館(British Library EThOS)與倫敦大學體系下的倫敦政經學院(LSE)線上圖書館的論文資料為例,論證在這兩邊線上圖書館論文上架編碼的排序方式是否具有一定程度的一致性,共同作為學位論文離群程度檢核的一種參考。

考古是為了還原過去的歷史真相,利用網路公開資訊還原真相的過程,我們稱為數位考古。本論文定義一個同序矩陣,建立評量函數,透過排序的差異度來評斷論文上架時間的離群程度。藉此指標,若驗證學位時發現有嚴重離群此指標平均的論文,我們才需特別用正式管道申請的方式來驗證。
Given the high time cost of applying for foreign degree thesis certification, the aim of this paper is to use publicly available online information to establish a preliminary screening system for the degree of outlier in theses. We use Python Selenium and BeautifulSoup to examine thesis data from the British Library EThOS and the online library of the London School of Economics (LSE) under the University of London system. We argue whether the sorting methods of thesis coding on these two online libraries have a certain degree of consistency, both serving as a reference for checking the degree of outlier in theses.

Archaeology is for the purpose of restoring the historical truth of the past, and the process of using publicly available online information to restore the truth, we call it digital archaeology. This paper defines a permutation matrix and establishes an evaluation function. The degree of deviation in sorting is used to judge the outlier degree of thesis shelf time. With this index, if a severe outlier is found during degree verification, we only need to verify it by applying through formal channels.
參考文獻 [1] 蔡壁如論文遭指「不當引用」 德明科大證實:啟動審理 (https://news.tvbs.com.tw/politics/1877096)

[2] 蔡壁如為論文驟然告別立院 4個考量設下停損點 (https://vip.udn.com/vip/story/122367/6688730)

[3] 林智堅「論文門」懶人包不斷更新:兩派到底吵什麼?後續有何發展?論文爭議始末一次看 (https://ynews.page.link/9Pb8)

[4] 台大認定林智堅論文抄襲撤銷碩士學位 教育部暫未收到訴願申請 (https://ynews.page.link/gpXM)

[5] 快訊》林智堅將主動退選!鄭運鵬接棒選桃園市長 (https://ctsnews.page.link/3jHaq)

[6] 週刊爆博士論文涉抄襲,高虹安公布辛辛那提大學校方聲明強調無版權問題,「我不是林智堅」(https://www.thenewslens.com/article/173039)

[7] 快訊/博士論文突遭母校下架?高虹安回應了 (https://ynews.page.link/CnkFp)

[8] D. M. Thomas and S. Mathur, "Data Analysis by Web Scraping using Python," 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2019, pp. 450-454, doi: 10.1109/ICECA.2019.8822022.

[9] Boeing, G.; Waddell, P. (2017). New Insights into Rental Housing Markets across the United States: Web Scraping and Analyzing Craigslist Rental Listings. Journal of Planning Education and Research, 37(4), 457–476.

[10] IDRIS, Aizal Yusrina; BAMOALLEM, Razan; MOHAMAD HATTA, Mohamad Harith Azfar. Web Scraping and Regression Analysis based on Machine Learning for COVID-19 with Rapid Software Platform. Mathematical Sciences and Informatics Journal, [S.l.], v. 3, n. 1, p. 75-85, may 2022. ISSN 2735-0703.

[11] 錯排問題 (https://peienwu.com/derangement/)

[12] Hassani, Mehdi. "Derangements and applications.." Journal of Integer Sequences [ electronic only ] 6.1 (2003): Art. 03.1.2, 8 p., electronic only-Art. 03.1.2, 8 p., electronic only. .

[13] Sloane, N.J.A. (編). Sequence A000166 (Subfactorial or rencontres numbers, or derangements: number of permutations of n elements with no fixed points.). The On-Line Encyclopedia of Integer Sequences. OEIS Foundation

[14] Ismail, M.E.H., Simeonov, P. Asymptotics of generalized derangements. Adv Comput Math 39, 101–127 (2013). https://doi.org/10.1007/s10444-011-9271-7
描述 碩士
國立政治大學
應用數學系
106751016
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106751016
資料類型 thesis
dc.contributor.advisor 曾正男zh_TW
dc.contributor.advisor Tzeng,Jeng-Nanen_US
dc.contributor.author (Authors) 陳以洵zh_TW
dc.contributor.author (Authors) Chen, Yi-Hsunen_US
dc.creator (作者) 陳以洵zh_TW
dc.creator (作者) Chen, Yi-Hsunen_US
dc.date (日期) 2023en_US
dc.date.accessioned 2-Aug-2023 13:01:59 (UTC+8)-
dc.date.available 2-Aug-2023 13:01:59 (UTC+8)-
dc.date.issued (上傳時間) 2-Aug-2023 13:01:59 (UTC+8)-
dc.identifier (Other Identifiers) G0106751016en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146297-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 應用數學系zh_TW
dc.description (描述) 106751016zh_TW
dc.description.abstract (摘要) 由於申請國外學位論文證明的時間成本較高,本論文目標為利用公開網路資訊,來建制出一套學位論文離群程度的初階篩選,我們以Python Selenium及BeautifulSoup針對大英線上圖書館(British Library EThOS)與倫敦大學體系下的倫敦政經學院(LSE)線上圖書館的論文資料為例,論證在這兩邊線上圖書館論文上架編碼的排序方式是否具有一定程度的一致性,共同作為學位論文離群程度檢核的一種參考。

考古是為了還原過去的歷史真相,利用網路公開資訊還原真相的過程,我們稱為數位考古。本論文定義一個同序矩陣,建立評量函數,透過排序的差異度來評斷論文上架時間的離群程度。藉此指標,若驗證學位時發現有嚴重離群此指標平均的論文,我們才需特別用正式管道申請的方式來驗證。
zh_TW
dc.description.abstract (摘要) Given the high time cost of applying for foreign degree thesis certification, the aim of this paper is to use publicly available online information to establish a preliminary screening system for the degree of outlier in theses. We use Python Selenium and BeautifulSoup to examine thesis data from the British Library EThOS and the online library of the London School of Economics (LSE) under the University of London system. We argue whether the sorting methods of thesis coding on these two online libraries have a certain degree of consistency, both serving as a reference for checking the degree of outlier in theses.

Archaeology is for the purpose of restoring the historical truth of the past, and the process of using publicly available online information to restore the truth, we call it digital archaeology. This paper defines a permutation matrix and establishes an evaluation function. The degree of deviation in sorting is used to judge the outlier degree of thesis shelf time. With this index, if a severe outlier is found during degree verification, we only need to verify it by applying through formal channels.
en_US
dc.description.tableofcontents 中文摘要 i
Abstract ii
目錄 iii
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機與目的 1
第二章 文獻探討 3
第一節 網頁探勘 3
第二節 錯排問題 5
第三章 研究方法 7
第一節 資料集來源 8
一、英國線上圖書資料彙整 8
二、Python 網頁探勘技巧 10
第二節 錯誤排序位置的測量指標 13
一、假說 13
二、定義錯位函數與錯位分數 16
三、錯位分數正規化 19
第四章 研究結果 20
第一節 1973-1989 年學位論文同位指標平均值,標準差與常態性檢定 22
第二節 1990-2018 年各年份同位指標平均值,標準差與常態性檢定 23
第三節 常態分佈的各年份學位論文同位指標離群值 (Z 分數法) 26
第四節 非常態分佈的各年份學位論文同位指標離群值 (箱型圖法) 27
第五章 結論 30
參考文獻 31
附錄 A Python Code 33
附錄 B 所有可能對應排序的錯位分數總表 44
zh_TW
dc.format.extent 1503739 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106751016en_US
dc.subject (關鍵詞) 論文比對zh_TW
dc.subject (關鍵詞) 網路爬蟲zh_TW
dc.subject (關鍵詞) 錯排問題zh_TW
dc.subject (關鍵詞) 隨機抽樣zh_TW
dc.subject (關鍵詞) 排序一致性zh_TW
dc.subject (關鍵詞) Z分數zh_TW
dc.subject (關鍵詞) 箱型圖法zh_TW
dc.subject (關鍵詞) Thesis Comparisonen_US
dc.subject (關鍵詞) Web Scrapingen_US
dc.subject (關鍵詞) Derangementsen_US
dc.subject (關鍵詞) Random Samplingen_US
dc.subject (關鍵詞) Sequential Consistencyen_US
dc.subject (關鍵詞) Z-Scoreen_US
dc.subject (關鍵詞) Box Plot Methoden_US
dc.title (題名) 大英線上圖書館與倫敦大學體系線上圖書館上架編碼的數位考古zh_TW
dc.title (題名) The digit archeology about Listing code of online British Library and University of Londonen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 蔡壁如論文遭指「不當引用」 德明科大證實:啟動審理 (https://news.tvbs.com.tw/politics/1877096)

[2] 蔡壁如為論文驟然告別立院 4個考量設下停損點 (https://vip.udn.com/vip/story/122367/6688730)

[3] 林智堅「論文門」懶人包不斷更新:兩派到底吵什麼?後續有何發展?論文爭議始末一次看 (https://ynews.page.link/9Pb8)

[4] 台大認定林智堅論文抄襲撤銷碩士學位 教育部暫未收到訴願申請 (https://ynews.page.link/gpXM)

[5] 快訊》林智堅將主動退選!鄭運鵬接棒選桃園市長 (https://ctsnews.page.link/3jHaq)

[6] 週刊爆博士論文涉抄襲,高虹安公布辛辛那提大學校方聲明強調無版權問題,「我不是林智堅」(https://www.thenewslens.com/article/173039)

[7] 快訊/博士論文突遭母校下架?高虹安回應了 (https://ynews.page.link/CnkFp)

[8] D. M. Thomas and S. Mathur, "Data Analysis by Web Scraping using Python," 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2019, pp. 450-454, doi: 10.1109/ICECA.2019.8822022.

[9] Boeing, G.; Waddell, P. (2017). New Insights into Rental Housing Markets across the United States: Web Scraping and Analyzing Craigslist Rental Listings. Journal of Planning Education and Research, 37(4), 457–476.

[10] IDRIS, Aizal Yusrina; BAMOALLEM, Razan; MOHAMAD HATTA, Mohamad Harith Azfar. Web Scraping and Regression Analysis based on Machine Learning for COVID-19 with Rapid Software Platform. Mathematical Sciences and Informatics Journal, [S.l.], v. 3, n. 1, p. 75-85, may 2022. ISSN 2735-0703.

[11] 錯排問題 (https://peienwu.com/derangement/)

[12] Hassani, Mehdi. "Derangements and applications.." Journal of Integer Sequences [ electronic only ] 6.1 (2003): Art. 03.1.2, 8 p., electronic only-Art. 03.1.2, 8 p., electronic only. .

[13] Sloane, N.J.A. (編). Sequence A000166 (Subfactorial or rencontres numbers, or derangements: number of permutations of n elements with no fixed points.). The On-Line Encyclopedia of Integer Sequences. OEIS Foundation

[14] Ismail, M.E.H., Simeonov, P. Asymptotics of generalized derangements. Adv Comput Math 39, 101–127 (2013). https://doi.org/10.1007/s10444-011-9271-7
zh_TW