Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 針對複合式競賽挑選最佳球員組合的方法
Selecting the best group of players for a composite competition
作者 鄧雅文
Teng, Ya Wen
貢獻者 陳良弼
Chen, Arbee L. P.
鄧雅文
Teng, Ya Wen
關鍵詞 Top-k查詢
不確定資料
Best-kGROUP查詢
Top-k query
uncertain data
Best-kGROUP query
skyline kGROUP
日期 2009
上傳時間 11-Oct-2011 16:57:34 (UTC+8)
摘要 在資料庫的處理中,top-k查詢幫助使用者從龐大的資料中萃取出具有價值的物件,它將資料庫中的物件依照給分公式給分後,選擇出分數最高的前k個回傳給使用者。然而在多數的情況下,一個物件也許不只有一個分數,要如何在多個分數中仍然選擇出整體最高分的前k個物件,便成為一個新的問題。在本研究中,我們將這樣的物件用不確定資料來表示,而每個物件的不確定性則是其帶有機率的分數以表示此分數出現的可能性,並提出一個新的問題:Best-kGROUP查詢。在此我們將情況模擬為一個複合式競賽,其中有多個子項目,每個項目的參賽人數各異,且最多需要k個人參賽;我們希望能針對此複合式競賽挑選出最佳的k個球員組合。當我們定義一個較佳的組合為其在較多項目居首位的機率比另一組合高,而最佳的組合則是沒有比它更佳的組合。為了加快挑選的速度,我們利用動態規劃的方式與篩選的演算法,將不可能的組合先剔除;所剩的組合則是具有天際線特質的組合,在這些天際線組合中,我們可以輕易的找出最佳的組合。此外,在實驗中,對於在所有球員中挑選最佳的組合,Best-kGROUP查詢也有非常優異的表現。
In a large database, top-k query is an important mechanism to retrieve the most valuable information for the users. It ranks data objects with a ranking function and reports the k objects with the highest scores. However, when an object has multiple scores, how to rank objects without information loss becomes challenging. In this paper, we model the object with multiple scores as an uncertain data object and the uncertainty of the object as a distribution of the scores, and consider a novel problem named Best-kGROUP query. Imagine the following scenario. Assume there is a composite competition consisting of several games each of which requires a distinct number of players. Suppose the largest number is k, and we want to select the best group of k players from all the players for the competition. A group x is considered better than another group y if x has higher aggregated probability to be the top ones in more games than y. In order to speed up the selection process, the groups worse than another group definitely should first be discarded. We identify these groups using a dynamic programming based approach and a filtering algorithm. The remaining groups with the property that none of them have higher aggregated probability to be the top ones for all games against the other groups are called skyline groups. From these skyline groups, we can easily compare them to select the best group for the composite competition. The experiments show that our approach outperforms the other approaches in selecting the best group to defeat the other groups in the composite competitions.
參考文獻 [1] I. F. Ilyas, G. Beskales, and M. A. Soliman. 2008. A Survey of Top-k Query Processing Techniques in Relational Database System. ACM Computing Surveys, Vol. 40, No. 4, pp. 11:1-11:58.
[2] C. C. Aggarwal and P. S. Yu. 2009. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 5, pp. 609-623.
[3] M. Soliman, I. Ilyas, and K. Chang. 2007. Top-k Query Processing in Uncertain Databases. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), pp. 896-905.
[4] M. Hua, J. Pei, W. Zhang, and X. Lin. 2008. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 1403-1405.
[5] T. Ge, S. Zdonik, and S. Madden. 2009. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In Proceedings of the 35th International Conference on Management of Data (SIGMOD), pp. 375-388.
[6] C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. 2008. Sliding-Window Top-k Queries on Uncertain Streams. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 301-312.
[7] G. Beskales, M. A. Soliman, and I. F. Ilyas. 2008. Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 326-339.
[8] M. L. Yiu and N. Mamoulis. 2007. Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. In Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483-494.
[9] C.-Y. Chan, H. V. Jagadish, K.-L. Tan, A. K. H. Tung, and Z. Zhang. 2006. Finding k-Dominant Skylines in High Dimensional Space. In Proceedings of the 32nd International Conference on Management of Data (SIGMOD), pp. 503-514.
描述 碩士
國立政治大學
資訊科學學系
97753034
98
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0097753034
資料類型 thesis
dc.contributor.advisor 陳良弼zh_TW
dc.contributor.advisor Chen, Arbee L. P.en_US
dc.contributor.author (Authors) 鄧雅文zh_TW
dc.contributor.author (Authors) Teng, Ya Wenen_US
dc.creator (作者) 鄧雅文zh_TW
dc.creator (作者) Teng, Ya Wenen_US
dc.date (日期) 2009en_US
dc.date.accessioned 11-Oct-2011 16:57:34 (UTC+8)-
dc.date.available 11-Oct-2011 16:57:34 (UTC+8)-
dc.date.issued (上傳時間) 11-Oct-2011 16:57:34 (UTC+8)-
dc.identifier (Other Identifiers) G0097753034en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/51590-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 97753034zh_TW
dc.description (描述) 98zh_TW
dc.description.abstract (摘要) 在資料庫的處理中,top-k查詢幫助使用者從龐大的資料中萃取出具有價值的物件,它將資料庫中的物件依照給分公式給分後,選擇出分數最高的前k個回傳給使用者。然而在多數的情況下,一個物件也許不只有一個分數,要如何在多個分數中仍然選擇出整體最高分的前k個物件,便成為一個新的問題。在本研究中,我們將這樣的物件用不確定資料來表示,而每個物件的不確定性則是其帶有機率的分數以表示此分數出現的可能性,並提出一個新的問題:Best-kGROUP查詢。在此我們將情況模擬為一個複合式競賽,其中有多個子項目,每個項目的參賽人數各異,且最多需要k個人參賽;我們希望能針對此複合式競賽挑選出最佳的k個球員組合。當我們定義一個較佳的組合為其在較多項目居首位的機率比另一組合高,而最佳的組合則是沒有比它更佳的組合。為了加快挑選的速度,我們利用動態規劃的方式與篩選的演算法,將不可能的組合先剔除;所剩的組合則是具有天際線特質的組合,在這些天際線組合中,我們可以輕易的找出最佳的組合。此外,在實驗中,對於在所有球員中挑選最佳的組合,Best-kGROUP查詢也有非常優異的表現。zh_TW
dc.description.abstract (摘要) In a large database, top-k query is an important mechanism to retrieve the most valuable information for the users. It ranks data objects with a ranking function and reports the k objects with the highest scores. However, when an object has multiple scores, how to rank objects without information loss becomes challenging. In this paper, we model the object with multiple scores as an uncertain data object and the uncertainty of the object as a distribution of the scores, and consider a novel problem named Best-kGROUP query. Imagine the following scenario. Assume there is a composite competition consisting of several games each of which requires a distinct number of players. Suppose the largest number is k, and we want to select the best group of k players from all the players for the competition. A group x is considered better than another group y if x has higher aggregated probability to be the top ones in more games than y. In order to speed up the selection process, the groups worse than another group definitely should first be discarded. We identify these groups using a dynamic programming based approach and a filtering algorithm. The remaining groups with the property that none of them have higher aggregated probability to be the top ones for all games against the other groups are called skyline groups. From these skyline groups, we can easily compare them to select the best group for the composite competition. The experiments show that our approach outperforms the other approaches in selecting the best group to defeat the other groups in the composite competitions.en_US
dc.description.tableofcontents 1 INTRODUCTION 1
2 RELATED WORK 7
2.1 Top-k Queries 7
2.2 Uncertain Data 8
2.2.1 Uncertain Top-k Queries 8
2.2.2 Uncertain Top-k Queries on Data Streams 14
2.2.3 Uncertain Nearest Neighbor Queries 14
2.3 Skyline 15
3 METHODOLOGY 17
3.1 Problem Definition 17
3.2 The Basic Algorithms 20
3.3 The Heuristic Approaches 28
4 EXPERIMENTS 29
4.1 Experiment Setup 29
4.2 On Execution Time 32
4.3 On Accuracy 34
4.4 On Performance of the Composite Competition 35
5 CONCLUSIONS AND FUTURE WORK 37
6 REFERENCES 40
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0097753034en_US
dc.subject (關鍵詞) Top-k查詢zh_TW
dc.subject (關鍵詞) 不確定資料zh_TW
dc.subject (關鍵詞) Best-kGROUP查詢zh_TW
dc.subject (關鍵詞) Top-k queryen_US
dc.subject (關鍵詞) uncertain dataen_US
dc.subject (關鍵詞) Best-kGROUP queryen_US
dc.subject (關鍵詞) skyline kGROUPen_US
dc.title (題名) 針對複合式競賽挑選最佳球員組合的方法zh_TW
dc.title (題名) Selecting the best group of players for a composite competitionen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] I. F. Ilyas, G. Beskales, and M. A. Soliman. 2008. A Survey of Top-k Query Processing Techniques in Relational Database System. ACM Computing Surveys, Vol. 40, No. 4, pp. 11:1-11:58.zh_TW
dc.relation.reference (參考文獻) [2] C. C. Aggarwal and P. S. Yu. 2009. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 5, pp. 609-623.zh_TW
dc.relation.reference (參考文獻) [3] M. Soliman, I. Ilyas, and K. Chang. 2007. Top-k Query Processing in Uncertain Databases. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), pp. 896-905.zh_TW
dc.relation.reference (參考文獻) [4] M. Hua, J. Pei, W. Zhang, and X. Lin. 2008. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 1403-1405.zh_TW
dc.relation.reference (參考文獻) [5] T. Ge, S. Zdonik, and S. Madden. 2009. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In Proceedings of the 35th International Conference on Management of Data (SIGMOD), pp. 375-388.zh_TW
dc.relation.reference (參考文獻) [6] C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. 2008. Sliding-Window Top-k Queries on Uncertain Streams. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 301-312.zh_TW
dc.relation.reference (參考文獻) [7] G. Beskales, M. A. Soliman, and I. F. Ilyas. 2008. Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 326-339.zh_TW
dc.relation.reference (參考文獻) [8] M. L. Yiu and N. Mamoulis. 2007. Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. In Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483-494.zh_TW
dc.relation.reference (參考文獻) [9] C.-Y. Chan, H. V. Jagadish, K.-L. Tan, A. K. H. Tung, and Z. Zhang. 2006. Finding k-Dominant Skylines in High Dimensional Space. In Proceedings of the 32nd International Conference on Management of Data (SIGMOD), pp. 503-514.zh_TW