SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 6 (Sep-Oct. 2012), PP 25-29
www.iosrjournals.org
www.iosrjournals.org 25 | Page
Data Mining for XML Query-Answering Support
KC. Ravi Kumar1
, E. Krishnaveni Reddy2
, Ramadevi.G3
1, 2, 3
(CSE, Sridevi women’s Engineering College, Hyderabad, Andhra Pradesh)
Abstract: XML has become a defacto standard for storing, sharing and exchanging information across
heterogeneous platforms. The XML content is growing day by day in rapid pace. Enterprises need to make
queries on XML databases frequently. As huge XML data is available, it is challenging task to extract required
data from XML database. It is computationally expensive to answer queries without any support. Towards this,
in this paper we present a technique known as Tree-based Association Rules (TARs) mined rules that provide
required information on structure and content of XML file and the TARs are also stored in XML format. The
mined knowledge (TARs) used later for XML query answering support. This enables quick and accurate
answering. We also developed a prototype application to demonstrate the efficiency of the proposed system. The
empirical results are very positive and query answering is expected to be useful in real time applications.
Index Terms: XML, query answering support, data mining, tree-based association rules
I. Introduction
XML has become a popular format for storing and sharing data across heterogeneous platforms. The
XML format is neutral, flexible and interoperable [5]. It is widely used in applications as it can allow
applications to have communication though they are built in different platforms. The XML documents are
plenty in enterprises and the data retrieval can be done in two ways. The first approach is that user gives
keywords and the program searches for relevant documents. The second approach is give XML queries that are
answered. The first approach is done using conventional information retrieval technique [4] that works on the
search process based on the given search word. With respect to query answering, it is not easy to process such
request. To make this searching easy this paper presents data mining for XML query answering support. XML
documents are validated by either DTD or schema. However, schema presence is not mandatory to process
XML file [3].
This paper presents data mining framework for XML query answering support. The XML documents
essence is extracted and kept in another XML file in the form of TARs. With the help of this XML query
answering becomes easy.
II. Proposed Framework
The proposed XML query answering support framework is as shown in fig. 1. The purpose of this
framework is to perform data mining on XML and obtain intentional knowledge. The intentional knowledge is
also in the form of XML. This is nothing but rules with supports and confidence. In other words the result of
data mining is TARs (Tree-based Association Rules).
Fig. 1 – Proposed XML query answering support framework
As can be seen in fig. 1, the framework is to have data mining for XML query answering support.
When XML file is given as input, DOM parser will parse it for wellformedness and validness. If the given XML
document is valid, it is parsed and loaded into a DOM object which can be navigated easily. The parsed XML
file is given to data mining sub system which is responsible for sub tree generation and also TAR extraction.
Data mining for XML query-answering support
www.iosrjournals.org 26 | Page
The generated TARs are used by Query Processor Sub System. This module takes XML query from end user
and makes use of mined knowledge to answer the query quickly.
Tar Extraction
Extracting TARs through data mining is a process with two steps. In the first step frequent subtrees that
satisfy given support are mined are mined. In the second step interesting rules that have confidence above given
threshold are calculated from the frequent subtrees. Finding frequent sub trees is described in [1], [2], [6], [7],
[8], [9]. Algorithm 1 finds frequent sub trees and calculates interesting rules.
The rules obtained from algorithm 1 are written to an XML file. Then indexing is made. Afterwards when XML
queries are made, the proposed system uses index and TARs and quickly answers the query.
III. Experiments And Results
Environment
The environment used to develop the prototype application includes JSE (Java Standard Edition) 6.0,
Net Beans IDE that run in Windows 7 OS. A PC with 2 GB RAM and 2.9x GHz processor is used. The Java
SWING API is used to build graphical user interface while IO and JAXP (Java API for XML Parsing) are used
for implementing functionality. The main application GUI is as shown in fig.
Fig. 2 – The GUI of the prototype application
As can be seen the GUI has provision to choose an XML file as input. It also allows choosing a file for
storing extracted rules. A text area is provided to show the XML file content. View Tree button shows the XML
file with graphical tree representation. Generate Large ItemSet button generates frequent sub treesthat will be
used for further processing. On clicking the GenerateRuleFile button, it extracts TARs from the given XML file
and finally extracted rules are saved into the given TAR file. The QueryAnswering button invokes a form where
user can enter queries. The query interface is shown in fig. 3.
Data mining for XML query-answering support
www.iosrjournals.org 27 | Page
Fig. 3 – Query interface and results
As can be seen the fig. 3 (a) shows interface for making queries. The queries given here are processed
faster using rule files extracted from XML files. The result of given query in fig. 3 (a) is shown in fig. 3 (b) with
the results that satisfied given support and confidence.
IV. Results
We have performed four types of experiments. They are based on time required to extract intentional
knowledge from XML; time required to answer intentional and extensional queries; monitoring extraction time
with given support and confidence; and study of accuracy of intentional answers.
As can be seen in fig. 4, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document.
Data mining for XML query-answering support
www.iosrjournals.org 28 | Page
As can be seen in fig. 5, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document which is generated using XMark.
As can be seen in fig. 6, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document with fixed depth.
Fig. 7 – Extraction time growth using CMTreeMiner with respect to number of nodes
As can be seen in fig. 7, the TAR extraction time is more when number of nodes in XML document is
more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given
XML document when CMTreeMiner is used.
As can be seen in fig. 8, the time taken for intensional and extensional query answering are plotted.
However, the intentional query answering takes very less time when compared with that of extensional
answering.
V. Conclusion
In this paper we presented a framework for extracting TARs from given XML file so as to support
XML queries. Towards this end, the aim of this paper is to mine frequent association rules and store the mined
content in XML format; use the TARs to support query answering or to gain information from XML databases.
A prototype application is built to test the efficiency of the proposed framework. The application takes XML file
as input and generates TARs and then finally index file that helps in query processing. The experimental results
revealed that the proposed application is useful and can be used in real time applications.
0
10
20
30
40
50
0 2 4 6 8 10
S
e
c
o
n
d
s
numberof nodes
Series1
Data mining for XML query-answering support
www.iosrjournals.org 29 | Page
References
[1] R. Agrawal and R. Srikant. Fast algorithms for mining association rulesin large databases. In Proc. of the 20th Int. Conf. on Very
Large DataBases, pages 487–499. Morgan Kaufmann Publishers Inc., 1994.
[2] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequentsubstructures in large unordered trees. In Technical Report DOI-
TR216, Department of Informatics, Kyushu University. http://www.i.kyushuu.ac.jp/doitr/trcs216.pdf, 2003.
[3] D. Barbosa, L. Mignet, and P. Veltri. Studying the xml web: Gatheringstatistics from an xml sample. World Wide Web, 8(4):413–
438, 2005.
[4] Gary Marchionini. Exploratory search: from finding to understanding.Communications of the ACM, 49(4):41–46, 2006.
[5] World Wide Web Consortium. Extensible Markup Language (XML) 1.0,1998. http://www.w3C.org/TR/REC-xml/.
[6] K. Wang and H. Liu. Discovering typical structures of documents: aroad map approach. In Proc. of the 21st Int. Conf. on Research
andDevelopment in Information Retrieval, pages 146–154, 1998.
[7] Y. Xiao, J. F. Yao, Z. Li, and M. H. Dunham. Efficient data mining formaximal frequent subtrees. In Proc. of the 3rd IEEE Int.
Conf. on DataMining, page 379. IEEE Computer Society, 2003.
[8] X. Yan and J. Han. Closegraph: mining closed frequent graph patterns.In Proc. of the 9th ACM Int. Conf. on Knowledge Discovery
and DataMining, pages 286–295. ACM Press, 2003.
[9] M. J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data
Engineering,17(8):1021–1035, 2005.Mirjana
About Authors:
Mr.K.C Ravi Kumar M.Tech CSE from JNTU Hderabad currently he is the head of
department for M.Tech CSE programme in Sridevi Women’s Engineering College having 17
years of Academic Experience. He is life member of IEEE & IST areas of research include
Data Mining & Data Warehousing Information Retrival Systems Information Security.
Mrs. E. Krishnaveni Reddy B.Tech(C.S.E),
M.Tech(S.E) Asst.Prof
Ms. Ramadevi. G, B.E in C.S.E from Muffakham Jah College of Engineering and
technology
M.Tech in C.S.E from Sridevi women’s Engineering college

Mais conteúdo relacionado

Mais procurados

Algoithems and data structures
Algoithems and data structuresAlgoithems and data structures
Algoithems and data structures
adamlongs1983
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
1crore projects
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 

Mais procurados (17)

Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive Graphs
 
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
Algoithems and data structures
Algoithems and data structuresAlgoithems and data structures
Algoithems and data structures
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
1699 1704
1699 17041699 1704
1699 1704
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure DataIRJET-  	  On-AIR Based Information Retrieval System for Semi-Structure Data
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
 
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictiona...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data query
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 

Semelhante a Data Mining for XML Query-Answering Support

A novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rankA novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rank
IAEME Publication
 
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in DataSecure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
IJERA Editor
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883
Editor IJARCET
 

Semelhante a Data Mining for XML Query-Answering Support (20)

Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML Documents
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
A novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rankA novel approach towards developing a statistical dependent and rank
A novel approach towards developing a statistical dependent and rank
 
An improvised frequent pattern tree
An improvised frequent pattern treeAn improvised frequent pattern tree
An improvised frequent pattern tree
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASEAVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
AVL TREE AN EFFICIENT RETRIEVAL ENGINE IN CLASSIFIED FINGERPRINT DATABASE
 
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in DataSecure Syntactic key Ranked Search over Encrypted Cloud in Data
Secure Syntactic key Ranked Search over Encrypted Cloud in Data
 
IRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
IRJET- Proficient Recovery Over Records using Encryption in Cloud Computing
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets Generation
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted Data
 
Review on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent ItemsReview on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent Items
 
Design of file system architecture with cluster
Design of file system architecture with clusterDesign of file system architecture with cluster
Design of file system architecture with cluster
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
 
50120130405014 2-3
50120130405014 2-350120130405014 2-3
50120130405014 2-3
 
Ap26261267
Ap26261267Ap26261267
Ap26261267
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Mining
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883
 

Mais de IOSR Journals

Mais de IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Data Mining for XML Query-Answering Support

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 6 (Sep-Oct. 2012), PP 25-29 www.iosrjournals.org www.iosrjournals.org 25 | Page Data Mining for XML Query-Answering Support KC. Ravi Kumar1 , E. Krishnaveni Reddy2 , Ramadevi.G3 1, 2, 3 (CSE, Sridevi women’s Engineering College, Hyderabad, Andhra Pradesh) Abstract: XML has become a defacto standard for storing, sharing and exchanging information across heterogeneous platforms. The XML content is growing day by day in rapid pace. Enterprises need to make queries on XML databases frequently. As huge XML data is available, it is challenging task to extract required data from XML database. It is computationally expensive to answer queries without any support. Towards this, in this paper we present a technique known as Tree-based Association Rules (TARs) mined rules that provide required information on structure and content of XML file and the TARs are also stored in XML format. The mined knowledge (TARs) used later for XML query answering support. This enables quick and accurate answering. We also developed a prototype application to demonstrate the efficiency of the proposed system. The empirical results are very positive and query answering is expected to be useful in real time applications. Index Terms: XML, query answering support, data mining, tree-based association rules I. Introduction XML has become a popular format for storing and sharing data across heterogeneous platforms. The XML format is neutral, flexible and interoperable [5]. It is widely used in applications as it can allow applications to have communication though they are built in different platforms. The XML documents are plenty in enterprises and the data retrieval can be done in two ways. The first approach is that user gives keywords and the program searches for relevant documents. The second approach is give XML queries that are answered. The first approach is done using conventional information retrieval technique [4] that works on the search process based on the given search word. With respect to query answering, it is not easy to process such request. To make this searching easy this paper presents data mining for XML query answering support. XML documents are validated by either DTD or schema. However, schema presence is not mandatory to process XML file [3]. This paper presents data mining framework for XML query answering support. The XML documents essence is extracted and kept in another XML file in the form of TARs. With the help of this XML query answering becomes easy. II. Proposed Framework The proposed XML query answering support framework is as shown in fig. 1. The purpose of this framework is to perform data mining on XML and obtain intentional knowledge. The intentional knowledge is also in the form of XML. This is nothing but rules with supports and confidence. In other words the result of data mining is TARs (Tree-based Association Rules). Fig. 1 – Proposed XML query answering support framework As can be seen in fig. 1, the framework is to have data mining for XML query answering support. When XML file is given as input, DOM parser will parse it for wellformedness and validness. If the given XML document is valid, it is parsed and loaded into a DOM object which can be navigated easily. The parsed XML file is given to data mining sub system which is responsible for sub tree generation and also TAR extraction.
  • 2. Data mining for XML query-answering support www.iosrjournals.org 26 | Page The generated TARs are used by Query Processor Sub System. This module takes XML query from end user and makes use of mined knowledge to answer the query quickly. Tar Extraction Extracting TARs through data mining is a process with two steps. In the first step frequent subtrees that satisfy given support are mined are mined. In the second step interesting rules that have confidence above given threshold are calculated from the frequent subtrees. Finding frequent sub trees is described in [1], [2], [6], [7], [8], [9]. Algorithm 1 finds frequent sub trees and calculates interesting rules. The rules obtained from algorithm 1 are written to an XML file. Then indexing is made. Afterwards when XML queries are made, the proposed system uses index and TARs and quickly answers the query. III. Experiments And Results Environment The environment used to develop the prototype application includes JSE (Java Standard Edition) 6.0, Net Beans IDE that run in Windows 7 OS. A PC with 2 GB RAM and 2.9x GHz processor is used. The Java SWING API is used to build graphical user interface while IO and JAXP (Java API for XML Parsing) are used for implementing functionality. The main application GUI is as shown in fig. Fig. 2 – The GUI of the prototype application As can be seen the GUI has provision to choose an XML file as input. It also allows choosing a file for storing extracted rules. A text area is provided to show the XML file content. View Tree button shows the XML file with graphical tree representation. Generate Large ItemSet button generates frequent sub treesthat will be used for further processing. On clicking the GenerateRuleFile button, it extracts TARs from the given XML file and finally extracted rules are saved into the given TAR file. The QueryAnswering button invokes a form where user can enter queries. The query interface is shown in fig. 3.
  • 3. Data mining for XML query-answering support www.iosrjournals.org 27 | Page Fig. 3 – Query interface and results As can be seen the fig. 3 (a) shows interface for making queries. The queries given here are processed faster using rule files extracted from XML files. The result of given query in fig. 3 (a) is shown in fig. 3 (b) with the results that satisfied given support and confidence. IV. Results We have performed four types of experiments. They are based on time required to extract intentional knowledge from XML; time required to answer intentional and extensional queries; monitoring extraction time with given support and confidence; and study of accuracy of intentional answers. As can be seen in fig. 4, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document.
  • 4. Data mining for XML query-answering support www.iosrjournals.org 28 | Page As can be seen in fig. 5, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document which is generated using XMark. As can be seen in fig. 6, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document with fixed depth. Fig. 7 – Extraction time growth using CMTreeMiner with respect to number of nodes As can be seen in fig. 7, the TAR extraction time is more when number of nodes in XML document is more. In other words, the time taken to extract TARs is directly proportional to the number of nodes in given XML document when CMTreeMiner is used. As can be seen in fig. 8, the time taken for intensional and extensional query answering are plotted. However, the intentional query answering takes very less time when compared with that of extensional answering. V. Conclusion In this paper we presented a framework for extracting TARs from given XML file so as to support XML queries. Towards this end, the aim of this paper is to mine frequent association rules and store the mined content in XML format; use the TARs to support query answering or to gain information from XML databases. A prototype application is built to test the efficiency of the proposed framework. The application takes XML file as input and generates TARs and then finally index file that helps in query processing. The experimental results revealed that the proposed application is useful and can be used in real time applications. 0 10 20 30 40 50 0 2 4 6 8 10 S e c o n d s numberof nodes Series1
  • 5. Data mining for XML query-answering support www.iosrjournals.org 29 | Page References [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rulesin large databases. In Proc. of the 20th Int. Conf. on Very Large DataBases, pages 487–499. Morgan Kaufmann Publishers Inc., 1994. [2] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequentsubstructures in large unordered trees. In Technical Report DOI- TR216, Department of Informatics, Kyushu University. http://www.i.kyushuu.ac.jp/doitr/trcs216.pdf, 2003. [3] D. Barbosa, L. Mignet, and P. Veltri. Studying the xml web: Gatheringstatistics from an xml sample. World Wide Web, 8(4):413– 438, 2005. [4] Gary Marchionini. Exploratory search: from finding to understanding.Communications of the ACM, 49(4):41–46, 2006. [5] World Wide Web Consortium. Extensible Markup Language (XML) 1.0,1998. http://www.w3C.org/TR/REC-xml/. [6] K. Wang and H. Liu. Discovering typical structures of documents: aroad map approach. In Proc. of the 21st Int. Conf. on Research andDevelopment in Information Retrieval, pages 146–154, 1998. [7] Y. Xiao, J. F. Yao, Z. Li, and M. H. Dunham. Efficient data mining formaximal frequent subtrees. In Proc. of the 3rd IEEE Int. Conf. on DataMining, page 379. IEEE Computer Society, 2003. [8] X. Yan and J. Han. Closegraph: mining closed frequent graph patterns.In Proc. of the 9th ACM Int. Conf. on Knowledge Discovery and DataMining, pages 286–295. ACM Press, 2003. [9] M. J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering,17(8):1021–1035, 2005.Mirjana About Authors: Mr.K.C Ravi Kumar M.Tech CSE from JNTU Hderabad currently he is the head of department for M.Tech CSE programme in Sridevi Women’s Engineering College having 17 years of Academic Experience. He is life member of IEEE & IST areas of research include Data Mining & Data Warehousing Information Retrival Systems Information Security. Mrs. E. Krishnaveni Reddy B.Tech(C.S.E), M.Tech(S.E) Asst.Prof Ms. Ramadevi. G, B.E in C.S.E from Muffakham Jah College of Engineering and technology M.Tech in C.S.E from Sridevi women’s Engineering college