
題名 網頁地理關聯性之分析與研究
The Analysis of Geographic Relations of Internet Information
作者 黃建達
Huang, Jian Da
貢獻者 何瑁鎧
Hor, Maw Kae
Huang, Jian Da
關鍵詞 地理資訊系統
geographic information system
information retrieval
web search engine
日期 2008
上傳時間 17-Sep-2009 14:03:35 (UTC+8)
摘要 近幾年來,有關地理資訊的網頁搜尋越來越受到重視。傳統的網頁搜尋引擎無法反應使用者查詢和網頁文件之間的地理關聯性。在一些情況下,我們希望網路搜尋引擎能夠考慮使用者查詢與網頁文件間的地理相關性,以提升搜尋的準確度。

我們的研究透過包圍矩形模型(Bounding Rectangle Model;BR Model)以搜尋與使用者查詢之地理相關程度較高的網頁文件。 使用者僅需輪入文字的查詢,即能得到相符結果的網頁文件。首先,我們建立一個地名辭典以找出使用者查詢與網頁文件內出現的地名及空間資料,接著我們利用空間資料建立空間索引項(spatial index term)集合,用來表示使用者查詢與網頁文件內的地理範圍,最後再透過使用者查詢與網頁文件的空間索引項集合計算兩者之間的地理相似程度,以找出與使用者查詢有較高地理關聯性的網頁文件。

Geographic web search becomes increasingly popular in recent years. Traditional web search engine, such as Google and Yahoo, can not accommodate geographic relevance between user queries and internet documents. Hence, they can not retrieve geographic related information from user queries. However, in many cases, the geographic relevance between user queries and internet documents could enhance the accuracy of this type of searches.

In this thesis, we propose a mechanism that uses the Bounding Rectangle Model (BR Model) to retrieve geographic relevant internet documents in response to user queries. Users provide only the conventional input queries (keywords) and our search engine will return the geographic relevant results. Our method can be classified into the following three steps. In the first step, we create a gazetteer and use it to relate the user query’s geographic terms in internet documents. In the next step, we use the spatial data to build a set of spatial index terms that represents the geographic scope of user query and internet documents. And then we use these spatial index terms to calculate degree of geographic similarity between user query and internet documents to identify highly relevant geographic internet documents.

We implemented a prototype search engine using our approach. The experiment results show that we can successfully retrieve geographic relevant data through this mechanism and provide more accurate search results.
參考文獻 [1] Amitay, Einat, Nadav Har’EI, Ron Sivan, and Aya Soffer. Web-a-Where: Geotagging Web Content. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR`04), UK, 2004.
[2] Baeza-Yates, Ricardo and Berthier Ribeiro-Neto. Modern Information Retrieval, New York: ACM Press, 1999.
[3] Brin, Sergey and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of 7th World Wide Web Conference (WWW` 98), Australia, 1998.
[4] Buyukkokten, Orkut, Junghoo Cho, Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar. Exploiting geographical location information of web pages. Proceedings of International Conference on Management of Data(SIGMOD`99), USA, 1999.
[5] Clementini, Eliseo, Paolino Di Felice, and Peter van Oosterom. A Small Set of Formal Topological Relationships Suitable for End-User Interaction. 3rd International Symposium on advances in Spatial Database(SSD’93),1993.
[6] Ding, Junyan, Luis Gravano, and Narayanan Shivakumar. Computing Geographical Scopes of Web Resources. Proceedings of the 26th VLDB Conference, Egypt, 2000.
[7] Hiramoto, Royoko and Kazutoshi Sumiya. Web Information Retrieval Based on User Operation on Digital Maps. Proceedings of the 14th annual ACM International Symposium on Advances in Geographic Information Systems(ACM-GIS`06), USA, 2006.
[8] Lee, R., H. Shiina, H. Takakura, Y.J. Kwon, and Y. Kambayashi. Optimization of Geographic Area to a Web Page for Two-Dimensional Range Query Processing. Proceedings of the Fourth International Conference on Information Systems Engineering Workshop (WISEW`03), Rome, 2004.
[9] Martins, Bruno, Mario J. Silva, and Leonardo Andrade. Indexing and Ranking in Geo-IR Systems. Proceedings of workshop on Geographic Information Retrieval(GIR`05), Germany, 2005.
[10] Wang, Chuang, Xing Xie, Lee Wang, Yansheng Lu, and Wei-Ying Ma. Detecting Geographic Locations from Web Resources. Proceedings of workshop on Geographic Information Retrieval(GIR`05), Germany, 2005.
[11] Zhou, Yinghua, Xing Xie, Chuang Wang, Yuchang Gong, and Wei-Ying Ma. Hybrid Index Structures for Location-based Web Search. Proceedings of Conference on Information and Knowledge Management(CIKM`05), Germany, 2005.
描述 碩士
資料類型 thesis
dc.contributor.advisor 何瑁鎧zh_TW
dc.contributor.advisor Hor, Maw Kaeen_US (Authors) 黃建達zh_TW (Authors) Huang, Jian Daen_US
dc.creator (作者) 黃建達zh_TW
dc.creator (作者) Huang, Jian Daen_US (日期) 2008en_US 17-Sep-2009 14:03:35 (UTC+8)- 17-Sep-2009 14:03:35 (UTC+8)- (上傳時間) 17-Sep-2009 14:03:35 (UTC+8)-
dc.identifier (Other Identifiers) G0094753033en_US
dc.identifier.uri (URI)
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 94753033zh_TW
dc.description (描述) 97zh_TW
dc.description.abstract (摘要) 近幾年來,有關地理資訊的網頁搜尋越來越受到重視。傳統的網頁搜尋引擎無法反應使用者查詢和網頁文件之間的地理關聯性。在一些情況下,我們希望網路搜尋引擎能夠考慮使用者查詢與網頁文件間的地理相關性,以提升搜尋的準確度。

我們的研究透過包圍矩形模型(Bounding Rectangle Model;BR Model)以搜尋與使用者查詢之地理相關程度較高的網頁文件。 使用者僅需輪入文字的查詢,即能得到相符結果的網頁文件。首先,我們建立一個地名辭典以找出使用者查詢與網頁文件內出現的地名及空間資料,接著我們利用空間資料建立空間索引項(spatial index term)集合,用來表示使用者查詢與網頁文件內的地理範圍,最後再透過使用者查詢與網頁文件的空間索引項集合計算兩者之間的地理相似程度,以找出與使用者查詢有較高地理關聯性的網頁文件。

dc.description.abstract (摘要) Geographic web search becomes increasingly popular in recent years. Traditional web search engine, such as Google and Yahoo, can not accommodate geographic relevance between user queries and internet documents. Hence, they can not retrieve geographic related information from user queries. However, in many cases, the geographic relevance between user queries and internet documents could enhance the accuracy of this type of searches.

In this thesis, we propose a mechanism that uses the Bounding Rectangle Model (BR Model) to retrieve geographic relevant internet documents in response to user queries. Users provide only the conventional input queries (keywords) and our search engine will return the geographic relevant results. Our method can be classified into the following three steps. In the first step, we create a gazetteer and use it to relate the user query’s geographic terms in internet documents. In the next step, we use the spatial data to build a set of spatial index terms that represents the geographic scope of user query and internet documents. And then we use these spatial index terms to calculate degree of geographic similarity between user query and internet documents to identify highly relevant geographic internet documents.

We implemented a prototype search engine using our approach. The experiment results show that we can successfully retrieve geographic relevant data through this mechanism and provide more accurate search results.
dc.description.tableofcontents 第一章 緒論 1
1.1 簡介 1
1.2 問題描述 3
1.3 論文架構 3
第二章 相關研究 5
第三章 包圍矩形模型建立 7
3.1 資訊檢索模型簡介 7
3.2 空間資料與地名辭典的建立 8
3.3 空間資料的形態與之間的空間關係 12
3.4 空間資料之間的空間關係偵測 14
3.4.1 定義 14
3.4.2 空間關係偵測 16
3.5 包圍矩形模型 18
3.5.1 建立空間索引項集合 18
3.5.2 解決地名混淆 20
3.5.3 去除多餘地名 22
3.5.4 計算地理分數 24
3.5.5 評估地理相似程度 26
第四章 系統架構 28
第五章 實驗 31
5.1 評估方法 33
5.2 數據統計 36
5.3 影響結果的可能因素 43
第六章 總結 50
6.1 結論 50
6.2 未來展望 51
參考文獻 53
dc.format.extent 50284 bytes-
dc.format.extent 94110 bytes-
dc.format.extent 112285 bytes-
dc.format.extent 130265 bytes-
dc.format.extent 208493 bytes-
dc.format.extent 186771 bytes-
dc.format.extent 520293 bytes-
dc.format.extent 155465 bytes-
dc.format.extent 407961 bytes-
dc.format.extent 218554 bytes-
dc.format.extent 41975 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源)
dc.subject (關鍵詞) 地理資訊系統zh_TW
dc.subject (關鍵詞) 資訊檢索zh_TW
dc.subject (關鍵詞) 網頁搜尋引擎zh_TW
dc.subject (關鍵詞) 包圍矩形模型zh_TW
dc.subject (關鍵詞) geographic information systemen_US
dc.subject (關鍵詞) information retrievalen_US
dc.subject (關鍵詞) web search engineen_US
dc.subject (關鍵詞) BR-Modelen_US
dc.title (題名) 網頁地理關聯性之分析與研究zh_TW
dc.title (題名) The Analysis of Geographic Relations of Internet Informationen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] Amitay, Einat, Nadav Har’EI, Ron Sivan, and Aya Soffer. Web-a-Where: Geotagging Web Content. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR`04), UK, 2004.zh_TW
dc.relation.reference (參考文獻) [2] Baeza-Yates, Ricardo and Berthier Ribeiro-Neto. Modern Information Retrieval, New York: ACM Press, 1999.zh_TW
dc.relation.reference (參考文獻) [3] Brin, Sergey and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of 7th World Wide Web Conference (WWW` 98), Australia, 1998.zh_TW
dc.relation.reference (參考文獻) [4] Buyukkokten, Orkut, Junghoo Cho, Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar. Exploiting geographical location information of web pages. Proceedings of International Conference on Management of Data(SIGMOD`99), USA, 1999.zh_TW
dc.relation.reference (參考文獻) [5] Clementini, Eliseo, Paolino Di Felice, and Peter van Oosterom. A Small Set of Formal Topological Relationships Suitable for End-User Interaction. 3rd International Symposium on advances in Spatial Database(SSD’93),1993.zh_TW
dc.relation.reference (參考文獻) [6] Ding, Junyan, Luis Gravano, and Narayanan Shivakumar. Computing Geographical Scopes of Web Resources. Proceedings of the 26th VLDB Conference, Egypt, 2000.zh_TW
dc.relation.reference (參考文獻) [7] Hiramoto, Royoko and Kazutoshi Sumiya. Web Information Retrieval Based on User Operation on Digital Maps. Proceedings of the 14th annual ACM International Symposium on Advances in Geographic Information Systems(ACM-GIS`06), USA, 2006.zh_TW
dc.relation.reference (參考文獻) [8] Lee, R., H. Shiina, H. Takakura, Y.J. Kwon, and Y. Kambayashi. Optimization of Geographic Area to a Web Page for Two-Dimensional Range Query Processing. Proceedings of the Fourth International Conference on Information Systems Engineering Workshop (WISEW`03), Rome, 2004.zh_TW
dc.relation.reference (參考文獻) [9] Martins, Bruno, Mario J. Silva, and Leonardo Andrade. Indexing and Ranking in Geo-IR Systems. Proceedings of workshop on Geographic Information Retrieval(GIR`05), Germany, 2005.zh_TW
dc.relation.reference (參考文獻) [10] Wang, Chuang, Xing Xie, Lee Wang, Yansheng Lu, and Wei-Ying Ma. Detecting Geographic Locations from Web Resources. Proceedings of workshop on Geographic Information Retrieval(GIR`05), Germany, 2005.zh_TW
dc.relation.reference (參考文獻) [11] Zhou, Yinghua, Xing Xie, Chuang Wang, Yuchang Gong, and Wei-Ying Ma. Hybrid Index Structures for Location-based Web Search. Proceedings of Conference on Information and Knowledge Management(CIKM`05), Germany, 2005.zh_TW