智慧型資訊整合於異質資料倉儲和資料探勘之模型、架構、與績效評估-應用本體論、母型綱要、和學名結構 | 學術產出

Publications-國科會研究計畫

Article View/Open

pdf(994)

Publication Export

Google Scholar^TM

題名	智慧型資訊整合於異質資料倉儲和資料探勘之模型、架構、與績效評估-應用本體論、母型綱要、和學名結構
其他題名	Intelligent Information Integration in Heterogeneous Data Warehouse and Data Mining Model, Framework, and Benchmark---An Application of Ontology, Metadata, and Generic Constructs
作者	諶家蘭
貢獻者	國立政治大學會計研究所行政院國家科學委員會
關鍵詞	資訊整合;母綱要;學名結構;本體論;延伸式標籤;績效評估;負載量模型;資料倉儲;資料探勘 Information Integration;Metadata;Generic Constructs;Ontology,XML;Benchmark;Workload;Data Warehouse;Data Mining
日期	2009
上傳時間	26-六月-2012 14:59:00 (UTC+8)
摘要	資訊科技和網際網路的蓬勃發展，異質資訊整合在企業電子化與電子商務環境中已是一項普遍存在而且相當重要的議題。在缺乏整合的情形下，個別地存取異質資訊來源，將造成資訊的混雜和錯誤以及浪費，尤其不能提供即時管理決策分析給企業主管。在傳統異質資訊整合的研究中，通常會創造一個共同資料模式來處理異質性的問題，目前可延伸性標記語言已經成為網路上交換資訊時的標準文件格式，使得可延伸性標記語言成為整合工作中共同資料模式的一個很好的候選者。然而，可延伸性標記語言僅能夠處理結構異質性，無法處理語意異質性，而本體論被視為是一個重要而且自然的工具可以用來表現真實世界中模糊不清的語意和關係。因此，在第一年研究計畫中，我們將採取本體和可延伸性標記語言以期達到智慧型資訊整合中的語意互動性。我們提出一個以學名結構導向非特殊隨機式對應的方法來產生全區域綱要方法，以促成非傳統而是以網路為基礎的異質資訊整合。我們也提出一項對異質資訊來源較具智慧性的查詢方法，該查詢方法應用了全區域景觀方法加上本體論觀念運用，可以同時提高對底層異質資訊來源的結構互動性和語意互動性。同時我們將透過雛型系統的實作來驗證本研究所提供的異質資訊整合方法的可行性。由於全球化趨勢與網際網路普遍，現今企業為因應國際潮流的挑戰，經常將集團部門分散佈署於世界各地，或者和位於不同地理位置的公司進行合併和結盟的策略，藉以提昇其競爭力與市場反應能力。因為地理位置分散的結果，企業集團當中通常存在著許多不同的資料倉儲系統，為了充分支援全球化管理決策的需求，這些不同的資料倉儲當中的資料必須能夠進行交換與整合。因此，亟需要有一套開放且獨立的資料交換與整合標準，俾能經由網際網路在不同的資料倉儲間交換多維度資料。然而，目前所知的跨資料倉儲之資料交換解決方案多侷限於逐列資料轉換或是以純文字檔案格式進行資料轉移的方式，這些方式除缺乏效率外亦不夠系統化，經常錯誤和無法提供及時資訊。因此，在第二年研究計畫中，接續第一年研究結果，我們將探討多維度資料交換的議題，並發展一個以可延伸性標記語言和母綱要為基礎的多維度資訊整合和交換模式。本研究提出一個基於學名結構的方法，以此方法發展一套單一的標準轉換和交換格式，並促成分散各地的資料倉儲間形成多對多的系統化映對模式，並輔以本研究所提出之多維度中介資料管理功能，可形成在網路上通用且以可延伸性標記語言和母綱要為基礎的多維度資訊整合和交換過程，並能兼顧效率與品質。本研究將開發雛型系統，以實作多維度資訊整合和資料交換，藉資證明本研究模式之可行性。同時，在電子商務環境中，企業收集與儲存了大量的資料和資訊，端賴有效系統工具來協助其進行資料處理、資訊擷取、以及決策分析，資料探勘結合資料倉儲將是企業最重要的商業智慧利器。因此，在第二年研究計畫中，我們同時探討如何促成資料探勘工具成為龐大異質資訊來源的淬取、分析、和預測的利器，此將是資訊整合一項終極目標。在第三年研究計畫中，接續第一年和第二年的研究結果，我們將探討一項在智慧型資訊整合中延伸標記語言和母綱要以及本體論的績效評估方法，亦即負載量模型的建立，藉以建立趨近一般化之資訊整合績效評估方法。本研究希望發展出一項能夠整合不同資料模型，以及這些資料模型中衍生出的語意，依照延伸標記語言與本體論以及學名式的資料結構進行建模，並達到負載量模型具有可攜性和延展性。我們將採取學名結構式、使用者需求定義、領域獨立的方法，以研發出整合不同資料模型與語意的趨近一般化的績效評估方法，並將開發績效評估產生器之雛形系統。 The research issues of information integration have become ubiquitous and critically important in e-business and e-commerce. Accessing the heterogeneous data sources separately without integration may lead to the chaos of information and decision requested. A common way to deal with the heterogeneity problems in traditional data integration is to create a data model in common and work around a mapping table. The eXtensible Markup Language (XML) has become the standard data format for exchanging information on the Web. It has been introduce to deal with the heterogeneity issue. However, XML only handles the structural heterogeneity. XML can barely deal with the semantic heterogeneity. Ontologies and metadata, on the other hand, are regarded as an important and natural alternaive to represent the implicit semantics and relationships among the real and complicated entities. In the first year research, we propose to develop an intelligence-oriented information integration method in the generation of global schema and integrated ontoloty. We aim to provide an intelligent query model over multiple heterogeneous information sources and build a global-as-view approach with ontology to facilitate the structural and semantic interoperability between data sources. A prototype system will be created to implement the method and serve as a proof of the validity and feasibility. The globalization and Internet have shaped enterprises into business units across the global and spread out in remotely distributed regions. As a result, there are a number of data warehouse systems in the geographically-distributed business environment. In order to meet the challenge of globalization and distributed decision-making on demand, heterogeneous data warehouses must face and handle the constant data exchange, data migration, and data integration issues. An open, scalable, and robust transformation and exchange method to transmit over the Internet must be created. In the second year research, the issue of multidimensional data cube exchange and integration will be addressed and tackled. An XML metadata-based multidimensional data exchange model will be developed. And, a generic-constructs-based approach to enable the many-to-many systematic mapping between the distributed data warehouses will be generated. We need to introduce a cohesive, consistent, and combo exchange format in the migration and transformation. We will develop an XML metadata-based prototype system to illustrate the exchange and demonstrate the feasibility and validity. Data mining combined with data warehouse emerges as the crucial decision support system and executive information system in enterprises. Mission critical business intelligence becomes the vital tool in dealing with the volume and speed of data and information collected and stored by companies from Internet and Intranet over the years. How to facilitate the data mining in the search and preparation of heterogeneous data sources becomes a viable and valued problem. In the second year research, we will tackle the information integration issue in data mining in order to facilitate the decision making process. Benchmarks are the vital tools in the performance measurement, evaluation, and comparison of an enterprise information system. Standard benchmarks are synthetic and domain-specific. Test results from these benchmarks are estimates of possible system performance for certain pre-determined application types. Information integration in data warehouse and data mining system performance on actual domain may vary significantly from those in the standard benchmarks. In the third year research, we propose to develop a new benchmark method that is more generalized, computer-assisted, and from the perspective of the user`s requirements. We describe an application-driven formulation process that models the workloads from the requirements analysis, data schema, and operation viewpoints. Test suites can be generated from the heterogeneous data sources and transaction specifications in an automated manner. More generality and cost effectiveness can be achieved. In this research, we propose to build a more generalized benchmark method based on generic constructs. It will facilitate the tests in the information integration for business intelligence. A computer-assisted prototype of this method will be constructed and used in the test experiments.
關聯	應用研究學術補助研究期間:9808~ 9907 研究經費:778仟元
資料類型	report

dc.contributor	國立政治大學會計研究所	en_US
dc.contributor	行政院國家科學委員會	en_US
dc.creator (作者)	諶家蘭	zh_TW
dc.date (日期)	2009	en_US
dc.date.accessioned	26-六月-2012 14:59:00 (UTC+8)	-
dc.date.available	26-六月-2012 14:59:00 (UTC+8)	-
dc.date.issued (上傳時間)	26-六月-2012 14:59:00 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/53290	-
dc.description.abstract (摘要)	資訊科技和網際網路的蓬勃發展，異質資訊整合在企業電子化與電子商務環境中已是一項普遍存在而且相當重要的議題。在缺乏整合的情形下，個別地存取異質資訊來源，將造成資訊的混雜和錯誤以及浪費，尤其不能提供即時管理決策分析給企業主管。在傳統異質資訊整合的研究中，通常會創造一個共同資料模式來處理異質性的問題，目前可延伸性標記語言已經成為網路上交換資訊時的標準文件格式，使得可延伸性標記語言成為整合工作中共同資料模式的一個很好的候選者。然而，可延伸性標記語言僅能夠處理結構異質性，無法處理語意異質性，而本體論被視為是一個重要而且自然的工具可以用來表現真實世界中模糊不清的語意和關係。因此，在第一年研究計畫中，我們將採取本體和可延伸性標記語言以期達到智慧型資訊整合中的語意互動性。我們提出一個以學名結構導向非特殊隨機式對應的方法來產生全區域綱要方法，以促成非傳統而是以網路為基礎的異質資訊整合。我們也提出一項對異質資訊來源較具智慧性的查詢方法，該查詢方法應用了全區域景觀方法加上本體論觀念運用，可以同時提高對底層異質資訊來源的結構互動性和語意互動性。同時我們將透過雛型系統的實作來驗證本研究所提供的異質資訊整合方法的可行性。由於全球化趨勢與網際網路普遍，現今企業為因應國際潮流的挑戰，經常將集團部門分散佈署於世界各地，或者和位於不同地理位置的公司進行合併和結盟的策略，藉以提昇其競爭力與市場反應能力。因為地理位置分散的結果，企業集團當中通常存在著許多不同的資料倉儲系統，為了充分支援全球化管理決策的需求，這些不同的資料倉儲當中的資料必須能夠進行交換與整合。因此，亟需要有一套開放且獨立的資料交換與整合標準，俾能經由網際網路在不同的資料倉儲間交換多維度資料。然而，目前所知的跨資料倉儲之資料交換解決方案多侷限於逐列資料轉換或是以純文字檔案格式進行資料轉移的方式，這些方式除缺乏效率外亦不夠系統化，經常錯誤和無法提供及時資訊。因此，在第二年研究計畫中，接續第一年研究結果，我們將探討多維度資料交換的議題，並發展一個以可延伸性標記語言和母綱要為基礎的多維度資訊整合和交換模式。本研究提出一個基於學名結構的方法，以此方法發展一套單一的標準轉換和交換格式，並促成分散各地的資料倉儲間形成多對多的系統化映對模式，並輔以本研究所提出之多維度中介資料管理功能，可形成在網路上通用且以可延伸性標記語言和母綱要為基礎的多維度資訊整合和交換過程，並能兼顧效率與品質。本研究將開發雛型系統，以實作多維度資訊整合和資料交換，藉資證明本研究模式之可行性。同時，在電子商務環境中，企業收集與儲存了大量的資料和資訊，端賴有效系統工具來協助其進行資料處理、資訊擷取、以及決策分析，資料探勘結合資料倉儲將是企業最重要的商業智慧利器。因此，在第二年研究計畫中，我們同時探討如何促成資料探勘工具成為龐大異質資訊來源的淬取、分析、和預測的利器，此將是資訊整合一項終極目標。在第三年研究計畫中，接續第一年和第二年的研究結果，我們將探討一項在智慧型資訊整合中延伸標記語言和母綱要以及本體論的績效評估方法，亦即負載量模型的建立，藉以建立趨近一般化之資訊整合績效評估方法。本研究希望發展出一項能夠整合不同資料模型，以及這些資料模型中衍生出的語意，依照延伸標記語言與本體論以及學名式的資料結構進行建模，並達到負載量模型具有可攜性和延展性。我們將採取學名結構式、使用者需求定義、領域獨立的方法，以研發出整合不同資料模型與語意的趨近一般化的績效評估方法，並將開發績效評估產生器之雛形系統。	en_US
dc.description.abstract (摘要)	The research issues of information integration have become ubiquitous and critically important in e-business and e-commerce. Accessing the heterogeneous data sources separately without integration may lead to the chaos of information and decision requested. A common way to deal with the heterogeneity problems in traditional data integration is to create a data model in common and work around a mapping table. The eXtensible Markup Language (XML) has become the standard data format for exchanging information on the Web. It has been introduce to deal with the heterogeneity issue. However, XML only handles the structural heterogeneity. XML can barely deal with the semantic heterogeneity. Ontologies and metadata, on the other hand, are regarded as an important and natural alternaive to represent the implicit semantics and relationships among the real and complicated entities. In the first year research, we propose to develop an intelligence-oriented information integration method in the generation of global schema and integrated ontoloty. We aim to provide an intelligent query model over multiple heterogeneous information sources and build a global-as-view approach with ontology to facilitate the structural and semantic interoperability between data sources. A prototype system will be created to implement the method and serve as a proof of the validity and feasibility. The globalization and Internet have shaped enterprises into business units across the global and spread out in remotely distributed regions. As a result, there are a number of data warehouse systems in the geographically-distributed business environment. In order to meet the challenge of globalization and distributed decision-making on demand, heterogeneous data warehouses must face and handle the constant data exchange, data migration, and data integration issues. An open, scalable, and robust transformation and exchange method to transmit over the Internet must be created. In the second year research, the issue of multidimensional data cube exchange and integration will be addressed and tackled. An XML metadata-based multidimensional data exchange model will be developed. And, a generic-constructs-based approach to enable the many-to-many systematic mapping between the distributed data warehouses will be generated. We need to introduce a cohesive, consistent, and combo exchange format in the migration and transformation. We will develop an XML metadata-based prototype system to illustrate the exchange and demonstrate the feasibility and validity. Data mining combined with data warehouse emerges as the crucial decision support system and executive information system in enterprises. Mission critical business intelligence becomes the vital tool in dealing with the volume and speed of data and information collected and stored by companies from Internet and Intranet over the years. How to facilitate the data mining in the search and preparation of heterogeneous data sources becomes a viable and valued problem. In the second year research, we will tackle the information integration issue in data mining in order to facilitate the decision making process. Benchmarks are the vital tools in the performance measurement, evaluation, and comparison of an enterprise information system. Standard benchmarks are synthetic and domain-specific. Test results from these benchmarks are estimates of possible system performance for certain pre-determined application types. Information integration in data warehouse and data mining system performance on actual domain may vary significantly from those in the standard benchmarks. In the third year research, we propose to develop a new benchmark method that is more generalized, computer-assisted, and from the perspective of the user`s requirements. We describe an application-driven formulation process that models the workloads from the requirements analysis, data schema, and operation viewpoints. Test suites can be generated from the heterogeneous data sources and transaction specifications in an automated manner. More generality and cost effectiveness can be achieved. In this research, we propose to build a more generalized benchmark method based on generic constructs. It will facilitate the tests in the information integration for business intelligence. A computer-assisted prototype of this method will be constructed and used in the test experiments.	en_US
dc.language.iso	en_US	-
dc.relation (關聯)	應用研究	en_US
dc.relation (關聯)	學術補助	en_US
dc.relation (關聯)	研究期間:9808~ 9907	en_US
dc.relation (關聯)	研究經費:778仟元	en_US
dc.subject (關鍵詞)	資訊整合;母綱要;學名結構;本體論;延伸式標籤;績效評估;負載量模型;資料倉儲;資料探勘	en_US
dc.subject (關鍵詞)	Information Integration;Metadata;Generic Constructs;Ontology,XML;Benchmark;Workload;Data Warehouse;Data Mining	en_US
dc.title (題名)	智慧型資訊整合於異質資料倉儲和資料探勘之模型、架構、與績效評估-應用本體論、母型綱要、和學名結構	zh_TW
dc.title.alternative (其他題名)	Intelligent Information Integration in Heterogeneous Data Warehouse and Data Mining Model, Framework, and Benchmark---An Application of Ontology, Metadata, and Generic Constructs	en_US
dc.type (資料類型)	report	en

Publications-國科會研究計畫

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM