Citation

> home

> about

> feedback

> login

Citation

Conference on Information and Knowledge Management >archive
Proceedings of the eleventh international conference on Information and knowledge management >toc

2002 , McLean, Virginia, USA

SESSION: XML constraints and the semantic web >toc

Discovering approximate keys in XML data

Authors
Gösta Grahne Concordia University
Jianfei Zhu Concordia University

Sponsors
  SIGMIS : ACM Special Interest Group on Management Information Systems
  ACM : Association for Computing Machinery
  SIGIR : ACM Special Interest Group on Information Retrieval

Publisher
ACM Press New York, NY, USA

Pages: 453 - 460 Series-Proceeding-Section-Article

Year of Publication: 2002

ISBN:1-58113-492-4

http://doi.acm.org/10.1145/584792.584867 (Use this link to Bookmark this page)

> full text > abstract > references > index terms > peer to peer

FULL TEXT:

Access Rules

Click here to gain access to the Full Text!

pdf

pdf 202 KB

ABSTRACT

Keys are very important in many aspects of data management, such as guiding query formulation, query optimization, indexing, etc. We consider the situation where an XML document does not come with key definitions, and we are interested in using data mining techniques to obtain a representation of the keys holding in a document. In order to have a compact representation of the set of keys holding in a document, we define a partial order on the set of all key expressions. This order is based on an analysis of the properties of absolute and relative keys for XML. Given the existence of the partial order, only a reduced set of key expressions need to be discovered.Due to the semistructured nature of XML documents, it turns out to be useful to consider keys that hold in "almost" the whole document, that is, they are violated only in a small part of the document. To this end, the support and confidence of a key expression are also defined, and the concept of approximate key expression is introduced. We give an efficient algorithm to mine a reduced set of approximate keys from an XML document.

REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1 ACM SIGMOD Record: XML Version, http://www.acm.org/sigmod/record/xml/.

2 S. Abiteboul, R. Hull and V. Vianu. Foundations of databases, Addison-Wesley, 1995.

3 R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Very Large Data Bases, pages 487--499, Santiago, 1994.

4 M. Arenas and L. Libkin. A normal form for XML documents, Proceedings of the 21th Symposium on Principles of Database Systems (PODS'02), pages 85--96, 2002.

5 T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensive Markup Language (XML) 1.0. World Wide Web Consortium (W3C), Feb. 1998. http://www.w3.org/TR/REC-xml.

6 P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan. Reasoning about Keys for XML. In 8th International Workshop on Databases and Programming Languages (DBPL '01).

7 P. Buneman, W. Fan,J. Siméon, S. Weinstein. Constraints for semistructured data and XML. SIGMOD Record, 30(1):47--55, 2001.

8 S. Davidson, Y. Chen and Y. Zheng. Technical report, Indexing Keys in Hierarchical Data, 2001.

9 W. Fan, L. Libkin. On XML Integrity Constraints in the Presence of DTDs. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 114--125, Santa Barbara, California, May 2001.

10 W. Fan, J. Siméon. Integrity Constraints for XML. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 23--34, Dallas, Texas, May 2000.

11 C. M. Hoffmann and M. J. O'Donnell. Pattern matching in trees, Journal of the ACM, 29(1):68--95, 1982.

12 Y. Huhtala, J. Kivinen, P. Porkka and H. Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions, ICDE, pages 392--401, 1998.

13 A. Layman et al. XML-Data. W3C Note, Jan. 1998. http://www.w3.org/TR/1998/ NOTE-XML-data.

14 K. Wang, H. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146--154, 1998.

15 P. Buneman, S. Khanna, K. Tajima, W. Tan, Archiving Scientific Data. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 1-12, 2002.

16 J. Kivinen and H. Mannila Approximate dependency inference from relations. Theoretical Computer Science, 149:129--149, 1995.

17 H. Mannila and K.-J. Räihä On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40:237--243, 1992.

18 Calders T., Paredaens J. Axiomatization of frequent sets. In Proceedings of the International Conference on Database Theory, pages 204--218, London, 2001.

INDEX TERMS

Primary Classification:
H. Information Systems
H.2 DATABASE MANAGEMENT
H.2.8 Database applications
Subjects: Data mining

General Terms:
Algorithms, Design, Theory

Peer to Peer - Readers of this Article have also read:

M⁴: a metamodel for data preprocessing
Proceedings of the fourth ACM international workshop on Data warehousing and OLAP
Anca Vaduva , Jörg-Uwe Kietz , Regina Zücker
Data structures for quadtree approximation and compression
Communications of the ACM 28, 9
Hanan Samet
MBONE: the multicast backbone
Communications of the ACM 37, 8
Hans Eriksson
The state of the art in automating usability evaluation of user interfaces
ACM Computing Surveys (CSUR) 33, 4
A lifecycle process for the effective reuse of commercial off-the-shelf (COTS) software
Proceedings of the 1999 symposium on Software reusability
Christine L. Braun

The ACM Portal is published by the Association for Computing Machinery. Copyright � 2003 ACM, Inc.