The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
XCLS++: A new algorithm to improve XCLS+ for clustering XML documents IJITCA Journal
The purpose of this paper is to offer a method for clustering of XML documents. Many different ways of clustering were discussed for clustering documents which can be divided to structure, content and combination of structure and content. One method that has been proposed to solve this problem is XCLS which be improved with XCLS+ later. XCLS+ is efficient algorithm for clustering XML documents and xposure into structure based on clustering category. The conducted survey showed which XCLS+ method has problem that makes away from its optimal value too. This paper presents a method with name XCLS++ which has not related problem XCLS+ and its efficiency are diagnosed more than XCLS+.
A Hybrid Algorithm for Improvement of XML Documents Clustering ...................................................1
Somayeh Ghazanfari and Hassan Naderi
A Novel Collaborative Filtering Friendship Recommendation Based on Smartphones ......................... 16
Dhananjaya G. M., Sachin C. Raykar and Mushtaq Ahmed D. M
Efficient Computational Tools for Nonlinear Flight Dynamic Analysis in the Full Envelope .................. 26
P. Lathasree and Abhay A. Pashilkar
Quantitative Aspects of Knowledge Knowledge Potential and Utility..................................................... 45
Syed V. Ahamed
A Novel Approach for Recommending Items based on Association Rule Mining ................................... 60
Vasundhara M. S.and Gururaj K. S.
Cluster Integrated Self Forming Wireless Sensor Based System for Intrusion Detection and Perimeter
Defense Applications ................................................................................................................................. 70
A. Inigo Mathew, M. Raj Kumar, S. R. Boselin Prabhu and Dr. S. Sophia
Large volume of information is stored in XML format in the Web, and clustering is a management method for this documents. Most of current methods for clustering XML documents consider only one of these two aspects. In this paper, we propose SCEM (Expectation Maximization Structure and Content) for XML documents which is used to effectively cluster XML documents by combining content and structural features. The other contribution of this paper is that we used probabilistic distributions in such way that have probability parameters corresponding to one cluster. In this way, we obtained better effectiveness compared to other clustering methods due to generality. Experimental results on real datasets show effectiveness of proposed method, particularly when it is applied on large XML documents without schema. Also it can be used to improve accuracy and effectiveness of XML information retrieval.
The document discusses fuzzy type-ahead search techniques for XML data. It describes how traditional XML query techniques like XPath and XQuery can be complex for users. It then discusses fuzzy search methods like the minimum cost tree approach and LCA-based interactive search that allow for approximate keyword matching. The paper also proposes using exclusive LCA and effective indexing and ranking algorithms to efficiently identify the top-k most relevant answers to a fuzzy keyword query in an XML document.
This document proposes a technique called XML duplication using XPath to detect duplicate records in hierarchical XML data. It uses a Bayesian network to calculate the probability that two XML elements are duplicates based on their attributes and structure. The technique aims to increase the efficiency of detecting and removing duplicate records from XML documents. It is evaluated on several XML datasets and is able to achieve high accuracy and efficiency compared to other duplicate detection techniques for hierarchical data.
Duplicate Detection in Hierarchical Data Using XPathiosrjce
There were many techniques for identifying duplicates in relational data, but only a few solutions
focus on identifying duplicates which has complex hierarchical structure, as XML data. In this paper, we
present a new technique for identifying XML duplicates, so-called XML duplication using Xpath. XML
duplication using Xpath technique uses a Bayesian network to conclude the possibility that two xml elements are
duplicates, based on the information within the elements and other information organized in the XML. In
addition, to increase the proficiency of the web usage, a new pruning strategy was created. This pruning
strategy will help to gain maximum benefits over non-computing algorithm. This technique can be used to
increase the proficiency of identifying duplicates and remove it, so no duplicate record will be there. Through
many experiments, our algorithm is able to achieve high accuracy and retrieve count in several XML dataset.
XML duplication using Xpath technique is able to outclass another technique for identifying duplicates, both in
proficiency and potency.
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSijdms
Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Therefore, it is important to provide a repository for XML documents, which supports efficient management and storage of XML data.For this purpose, many proposals have been made, the most common ones are node labeling schemes. On
the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data.
In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the
label size. Also We report the experimental results to examine the impact of binary string encoding on the query performance and the storage size needed to store the compacted XML documents.
The document provides an overview of approaches for clustering XML data based on structure and content. It first outlines applications where XML clustering is useful, including XML query processing and data integration. It then presents a generic framework for XML clustering with three phases: data representation, similarity computation, and clustering/grouping. The document surveys current approaches and aims to classify them and identify common features. It also discusses challenges in XML clustering and future research directions.
XCLS++: A new algorithm to improve XCLS+ for clustering XML documents IJITCA Journal
The purpose of this paper is to offer a method for clustering of XML documents. Many different ways of clustering were discussed for clustering documents which can be divided to structure, content and combination of structure and content. One method that has been proposed to solve this problem is XCLS which be improved with XCLS+ later. XCLS+ is efficient algorithm for clustering XML documents and xposure into structure based on clustering category. The conducted survey showed which XCLS+ method has problem that makes away from its optimal value too. This paper presents a method with name XCLS++ which has not related problem XCLS+ and its efficiency are diagnosed more than XCLS+.
A Hybrid Algorithm for Improvement of XML Documents Clustering ...................................................1
Somayeh Ghazanfari and Hassan Naderi
A Novel Collaborative Filtering Friendship Recommendation Based on Smartphones ......................... 16
Dhananjaya G. M., Sachin C. Raykar and Mushtaq Ahmed D. M
Efficient Computational Tools for Nonlinear Flight Dynamic Analysis in the Full Envelope .................. 26
P. Lathasree and Abhay A. Pashilkar
Quantitative Aspects of Knowledge Knowledge Potential and Utility..................................................... 45
Syed V. Ahamed
A Novel Approach for Recommending Items based on Association Rule Mining ................................... 60
Vasundhara M. S.and Gururaj K. S.
Cluster Integrated Self Forming Wireless Sensor Based System for Intrusion Detection and Perimeter
Defense Applications ................................................................................................................................. 70
A. Inigo Mathew, M. Raj Kumar, S. R. Boselin Prabhu and Dr. S. Sophia
Large volume of information is stored in XML format in the Web, and clustering is a management method for this documents. Most of current methods for clustering XML documents consider only one of these two aspects. In this paper, we propose SCEM (Expectation Maximization Structure and Content) for XML documents which is used to effectively cluster XML documents by combining content and structural features. The other contribution of this paper is that we used probabilistic distributions in such way that have probability parameters corresponding to one cluster. In this way, we obtained better effectiveness compared to other clustering methods due to generality. Experimental results on real datasets show effectiveness of proposed method, particularly when it is applied on large XML documents without schema. Also it can be used to improve accuracy and effectiveness of XML information retrieval.
The document discusses fuzzy type-ahead search techniques for XML data. It describes how traditional XML query techniques like XPath and XQuery can be complex for users. It then discusses fuzzy search methods like the minimum cost tree approach and LCA-based interactive search that allow for approximate keyword matching. The paper also proposes using exclusive LCA and effective indexing and ranking algorithms to efficiently identify the top-k most relevant answers to a fuzzy keyword query in an XML document.
This document proposes a technique called XML duplication using XPath to detect duplicate records in hierarchical XML data. It uses a Bayesian network to calculate the probability that two XML elements are duplicates based on their attributes and structure. The technique aims to increase the efficiency of detecting and removing duplicate records from XML documents. It is evaluated on several XML datasets and is able to achieve high accuracy and efficiency compared to other duplicate detection techniques for hierarchical data.
Duplicate Detection in Hierarchical Data Using XPathiosrjce
There were many techniques for identifying duplicates in relational data, but only a few solutions
focus on identifying duplicates which has complex hierarchical structure, as XML data. In this paper, we
present a new technique for identifying XML duplicates, so-called XML duplication using Xpath. XML
duplication using Xpath technique uses a Bayesian network to conclude the possibility that two xml elements are
duplicates, based on the information within the elements and other information organized in the XML. In
addition, to increase the proficiency of the web usage, a new pruning strategy was created. This pruning
strategy will help to gain maximum benefits over non-computing algorithm. This technique can be used to
increase the proficiency of identifying duplicates and remove it, so no duplicate record will be there. Through
many experiments, our algorithm is able to achieve high accuracy and retrieve count in several XML dataset.
XML duplication using Xpath technique is able to outclass another technique for identifying duplicates, both in
proficiency and potency.
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSijdms
Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Therefore, it is important to provide a repository for XML documents, which supports efficient management and storage of XML data.For this purpose, many proposals have been made, the most common ones are node labeling schemes. On
the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data.
In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the
label size. Also We report the experimental results to examine the impact of binary string encoding on the query performance and the storage size needed to store the compacted XML documents.
The document provides an overview of approaches for clustering XML data based on structure and content. It first outlines applications where XML clustering is useful, including XML query processing and data integration. It then presents a generic framework for XML clustering with three phases: data representation, similarity computation, and clustering/grouping. The document surveys current approaches and aims to classify them and identify common features. It also discusses challenges in XML clustering and future research directions.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Development of a new indexing technique for XML document retrievalAmjad Ali
The document proposes a new indexing technique for XML document retrieval that addresses issues with existing techniques. It represents an XML document as a tree structure with nodes corresponding to elements, attributes, and content. Nodes are labeled with start/end positions and level to allow efficient updates by leaving gaps between labels. The technique permits fast retrieval of ancestor-descendant and parent-child relationships without recomputing the index on updates. Future work could include indexing comments and handling two separate indices for updates and queries.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
This document summarizes research on effective data retrieval from XML documents using the TreeMatch algorithm. It begins with an abstract that introduces the TreeMatch algorithm and its ability to provide fast data retrieval from XML documents by matching tree-shaped patterns. It then reviews related work on XML tree matching algorithms and their issues like suboptimality. The document proposes using the TreeMatch algorithm to overcome issues with wildcards, negation, and siblings when querying XML documents with XPath or XQuery. It provides details on the TreeMatch algorithm and its ability to process different types of XML tree pattern queries efficiently while avoiding intermediate results. In conclusion, it states that the TreeMatch algorithm can efficiently handle three types of XML tree pattern queries and overcome the problem of sub
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTScsandit
This document summarizes an investigation into using binary string encoding to compactly represent XML documents. The study explores prefix-free encoding schemes that map XML node labels to binary strings to minimize storage requirements. Experimental results show that a proposed prefix-free encoding scheme where each bitstring division observes byte boundaries achieved the most efficient storage sizes for various real and synthetic XML datasets compacted using different labeling and compaction approaches. Future work could explore additional encoding formats and their impact on storage costs and query/update performance for compacted XML data.
Expression of Query in XML object-oriented databaseEditor IJCATR
This document discusses expressing queries in an XML object-oriented database. It proposes a method for querying an object-oriented database where the user can assign weights to restrictions in conjunctive or disjunctive queries based on importance. The queries and resulting objects are represented using XML labels to simplify them and make them closer to user needs. It also presents a case study of a book information registration system to evaluate the proposed query method on an XML object-oriented database.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later, the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels. The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database
only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method
is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a
combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later,
the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels.
The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Very few research works have been done on XML security over relational databases despite that XML became the de facto standard for the data representation and exchange on the internet and a lot of XML documents are stored in RDBMS. In [14], the author proposed an access control model for schema-based storage of XML documents in relational storage and translating XML access control rules to relational access control rules. However, the proposed algorithms had performance drawbacks. In this paper, we will use the same access control model of [14] and try to overcome the drawbacks of [14] by proposing an efficient technique to store the XML access control rules in a relational storage of XML DTD. The mapping of the XML DTD to relational schema is proposed in [7]. We also propose an algorithm to translate XPath queries to SQL queries based on the mapping algorithm in [7].
Very few research works have been done on XML security over relational databases despite that XML
became the de facto standard for the data representation and exchange on the internet and a lot of XML
documents are stored in RDBMS. In [14], the author proposed an access control model for schema-based
storage of XML documents in relational storage and translating XML access control rules to relational
access control rules. However, the proposed algorithms had performance drawbacks. In this paper, we will
use the same access control model of [14] and try to overcome the drawbacks of [14] by proposing an
efficient technique to store the XML access control rules in a relational storage of XML DTD. The mapping
of the XML DTD to relational schema is proposed in [7]. We also propose an algorithm to translate XPath
queries to SQL queries based on the mapping algorithm in [7].
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
This document provides a survey of XML tree patterns (TPs), which are used to query tree-structured XML data. It outlines various TP models and features. It also reviews two main approaches for optimizing TP matching: TP minimization and holistic matching. The document aims to provide a global overview of over 10 years of research on TPs and related issues.
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
This document proposes a new multi-viewpoint based similarity measure for clustering text documents that aims to overcome limitations of existing measures. Existing measures use a single viewpoint to measure similarity between documents, but the proposed measure uses multiple viewpoints to ensure clusters exhibit all relationships between documents. The empirical study found that using a multi-viewpoint similarity measure forms more meaningful clusters by capturing more informative relationships between documents.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
This document summarizes a research paper that introduces a novel multi-viewpoint similarity measure for clustering text documents. The paper begins with background on commonly used similarity measures like Euclidean distance and cosine similarity. It then presents the novel multi-viewpoint measure, which considers multiple viewpoints (objects not assumed to be in the same cluster) rather than a single viewpoint. The paper proposes two new clustering criterion functions based on this measure and compares them to other algorithms on benchmark datasets. The goal is to develop a similarity measure and clustering methods that provide high-quality, consistent performance like k-means but can better handle sparse, high-dimensional text data.
While the world is witnessing an information revolution unprecedented and great speed in the growth of databases in all aspects. Databases interconnect with their content and schema but use different elements and structures to express the same concepts and relations, which may cause semantic and structural conflicts. This paper proposes a new technique for integration the heterogeneous eXtensible Markup Language (XML) schemas, under the name XDEHD. The returned mediated schema contains all concepts and relations of the sources without duplication. Detailed technique divides into three steps; First, extract all subschemas from the sources by decompose the schemas sources, each subschema contains three levels, these levels are ancestor, root and leaf. Thereafter, second, the technique matches and compares the subschemas and return the related candidate subschemas, semantic closeness function is implemented to measures the degree how similar the concepts of subschemas are modelled in the sources. Finally, create the medicate schema by integration the candidate subschemas, and then obtain the minimal and complete unified schema, association strength function is developed to compute closely of pair in candidate subschema across all data sources, and elements repetition function is employed to calculate how many times each element repeated between the candidate subschema.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
This document summarizes a research paper on applying a multiviewpoint-based similarity measure to hierarchical document clustering. It begins by introducing document clustering and hierarchical clustering. It then discusses traditional similarity measures used for clustering and introduces a new multiviewpoint-based similarity measure (MVS) that uses multiple reference points to more accurately assess similarity. The paper applies MVS to both hierarchical and k-means clustering algorithms and evaluates the accuracy, precision, and recall of the resulting clusters. It finds that hierarchical clustering with MVS achieves better performance than k-means clustering with MVS based on these evaluation metrics.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
Mais conteúdo relacionado
Semelhante a The International Journal of Information Technology, Control and Automation (IJITCA)
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Development of a new indexing technique for XML document retrievalAmjad Ali
The document proposes a new indexing technique for XML document retrieval that addresses issues with existing techniques. It represents an XML document as a tree structure with nodes corresponding to elements, attributes, and content. Nodes are labeled with start/end positions and level to allow efficient updates by leaving gaps between labels. The technique permits fast retrieval of ancestor-descendant and parent-child relationships without recomputing the index on updates. Future work could include indexing comments and handling two separate indices for updates and queries.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
This document summarizes research on effective data retrieval from XML documents using the TreeMatch algorithm. It begins with an abstract that introduces the TreeMatch algorithm and its ability to provide fast data retrieval from XML documents by matching tree-shaped patterns. It then reviews related work on XML tree matching algorithms and their issues like suboptimality. The document proposes using the TreeMatch algorithm to overcome issues with wildcards, negation, and siblings when querying XML documents with XPath or XQuery. It provides details on the TreeMatch algorithm and its ability to process different types of XML tree pattern queries efficiently while avoiding intermediate results. In conclusion, it states that the TreeMatch algorithm can efficiently handle three types of XML tree pattern queries and overcome the problem of sub
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTScsandit
This document summarizes an investigation into using binary string encoding to compactly represent XML documents. The study explores prefix-free encoding schemes that map XML node labels to binary strings to minimize storage requirements. Experimental results show that a proposed prefix-free encoding scheme where each bitstring division observes byte boundaries achieved the most efficient storage sizes for various real and synthetic XML datasets compacted using different labeling and compaction approaches. Future work could explore additional encoding formats and their impact on storage costs and query/update performance for compacted XML data.
Expression of Query in XML object-oriented databaseEditor IJCATR
This document discusses expressing queries in an XML object-oriented database. It proposes a method for querying an object-oriented database where the user can assign weights to restrictions in conjunctive or disjunctive queries based on importance. The queries and resulting objects are represented using XML labels to simplify them and make them closer to user needs. It also presents a case study of a book information registration system to evaluate the proposed query method on an XML object-oriented database.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later, the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels. The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
Expression of Query in XML object-oriented databaseEditor IJCATR
Upon invent of object-oriented database, the concept of behavior in database was propounded. Before, relational database
only provided a logical modeling of data and paid no attention to the operations applied on data in the system. In this paper, a method
is presented for query of object-oriented database. This method has appropriate results when the user explains restrictions in a
combinational matter (disjunctive and conjunctive) and assumes a weight for each one of restrictions based on their importance. Later,
the obtained results are sorted based on their belonging rate to the response set. In continue, queries are explained using XML labels.
The purpose is simplifying queries and objects resulted from queries to be very close to the user need and meet his expectation.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Very few research works have been done on XML security over relational databases despite that XML became the de facto standard for the data representation and exchange on the internet and a lot of XML documents are stored in RDBMS. In [14], the author proposed an access control model for schema-based storage of XML documents in relational storage and translating XML access control rules to relational access control rules. However, the proposed algorithms had performance drawbacks. In this paper, we will use the same access control model of [14] and try to overcome the drawbacks of [14] by proposing an efficient technique to store the XML access control rules in a relational storage of XML DTD. The mapping of the XML DTD to relational schema is proposed in [7]. We also propose an algorithm to translate XPath queries to SQL queries based on the mapping algorithm in [7].
Very few research works have been done on XML security over relational databases despite that XML
became the de facto standard for the data representation and exchange on the internet and a lot of XML
documents are stored in RDBMS. In [14], the author proposed an access control model for schema-based
storage of XML documents in relational storage and translating XML access control rules to relational
access control rules. However, the proposed algorithms had performance drawbacks. In this paper, we will
use the same access control model of [14] and try to overcome the drawbacks of [14] by proposing an
efficient technique to store the XML access control rules in a relational storage of XML DTD. The mapping
of the XML DTD to relational schema is proposed in [7]. We also propose an algorithm to translate XPath
queries to SQL queries based on the mapping algorithm in [7].
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
This document provides a survey of XML tree patterns (TPs), which are used to query tree-structured XML data. It outlines various TP models and features. It also reviews two main approaches for optimizing TP matching: TP minimization and holistic matching. The document aims to provide a global overview of over 10 years of research on TPs and related issues.
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
This document proposes a new multi-viewpoint based similarity measure for clustering text documents that aims to overcome limitations of existing measures. Existing measures use a single viewpoint to measure similarity between documents, but the proposed measure uses multiple viewpoints to ensure clusters exhibit all relationships between documents. The empirical study found that using a multi-viewpoint similarity measure forms more meaningful clusters by capturing more informative relationships between documents.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
This document summarizes a research paper that introduces a novel multi-viewpoint similarity measure for clustering text documents. The paper begins with background on commonly used similarity measures like Euclidean distance and cosine similarity. It then presents the novel multi-viewpoint measure, which considers multiple viewpoints (objects not assumed to be in the same cluster) rather than a single viewpoint. The paper proposes two new clustering criterion functions based on this measure and compares them to other algorithms on benchmark datasets. The goal is to develop a similarity measure and clustering methods that provide high-quality, consistent performance like k-means but can better handle sparse, high-dimensional text data.
While the world is witnessing an information revolution unprecedented and great speed in the growth of databases in all aspects. Databases interconnect with their content and schema but use different elements and structures to express the same concepts and relations, which may cause semantic and structural conflicts. This paper proposes a new technique for integration the heterogeneous eXtensible Markup Language (XML) schemas, under the name XDEHD. The returned mediated schema contains all concepts and relations of the sources without duplication. Detailed technique divides into three steps; First, extract all subschemas from the sources by decompose the schemas sources, each subschema contains three levels, these levels are ancestor, root and leaf. Thereafter, second, the technique matches and compares the subschemas and return the related candidate subschemas, semantic closeness function is implemented to measures the degree how similar the concepts of subschemas are modelled in the sources. Finally, create the medicate schema by integration the candidate subschemas, and then obtain the minimal and complete unified schema, association strength function is developed to compute closely of pair in candidate subschema across all data sources, and elements repetition function is employed to calculate how many times each element repeated between the candidate subschema.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
This document summarizes a research paper on applying a multiviewpoint-based similarity measure to hierarchical document clustering. It begins by introducing document clustering and hierarchical clustering. It then discusses traditional similarity measures used for clustering and introduces a new multiviewpoint-based similarity measure (MVS) that uses multiple reference points to more accurately assess similarity. The paper applies MVS to both hierarchical and k-means clustering algorithms and evaluates the accuracy, precision, and recall of the resulting clusters. It finds that hierarchical clustering with MVS achieves better performance than k-means clustering with MVS based on these evaluation metrics.
Semelhante a The International Journal of Information Technology, Control and Automation (IJITCA) (20)
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
Information Technology Convergence Services & AI (ITCAI 2024)IJITCA Journal
Welcome to ITCAI 2024
** Registration is currently open **
Submit Your Research Articles...!!!
International Conference on Information Technology Convergence Services & AI (ITCAI 2024)
September 14 ~ 15, 2024, Virtual Conference
https://itca2024.org/
Submission Deadline : June 01, 2024
Contact us:
Here's where you can reach us : itca@itca2024.org or itcaiconf@gmail.com
Submission Link:
https://itca2024.org/submission/index.php
#science #robots #coder #artificialintelligenceai #cybersecurity #java #javascript #future #digital #datavisualization #neuralnetworks #blockchain #digitalmarketing #raspberrypi #electronics #webdevelopment #marketing #html #startup #digitalart #dataanalysis #arduino #android #internetofthings #computervision #css #design #bhfyp #chatbot #codinglife
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
2 nd International Conference on Soft Computing, Data mining and Data Scienc...IJITCA Journal
2
nd International Conference on Soft Computing, Data mining and Data Science (SCDD
2024) will provide an excellent international forum for sharing knowledge and results in
theory, methodology and applications of Soft Computing, Data mining, and Data Science.
The Conference looks for significant contributions to all major fields of the Soft Computing,
Data mining, and Data Science in theoretical and practical aspects. The aim of the
Conference is to provide a platform to the researchers and practitioners from both academia
as well as industry to meet and share cutting-edge development in the field.
Authors are solicited to contribute to the Conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the following areas, but are not limited to.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
The International Journal of Information Technology, Control and Automation (...IJITCA Journal
The International Journal of Information Technology, Control and Automation (IJITCA) is a Quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Information Technology (IT), Control Systems and Automation Engineering. The journal focuses on all technical and practical aspects of IT, Control Systems and Automation with applications in real-world engineering and scientific problems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on information technology, control engineering, automation, modeling concepts and establishing new collaborations in these areas.
Authors are invited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Information Technology, Control Systems and Automation.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Building Production Ready Search Pipelines with Spark and Milvus
The International Journal of Information Technology, Control and Automation (IJITCA)
1. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
DOI:10.5121/ijitca.2012.2401 1
XCLS++: A new algorithm to improve XCLS+ for
clustering XML documents
Ahmad khodayar1
and Hassan naderi2
1
Department of Computer Engineering, Shabestar Islamic Azad University of Science
and Technology, Shabestar, Iran
ahmad_kho2000@yahoo.com
2
Department of Computer Engineering, Iran University of Science and Technology,
Resalat st., Tehran, Iran
naderi@iust.ac.ir
Abstract
The purpose of this paper is to offer a method for clustering of XML documents. Many different ways of
clustering were discussed for clustering documents which can be divided to structure, content and
combination of structure and content. One method that has been proposed to solve this problem is XCLS
which be improved with XCLS+ later. XCLS+ is efficient algorithm for clustering XML documents and
Exposure into structure based on clustering category. The conducted survey showed which XCLS+ method
has problem that makes away from its optimal value too. This paper presents a method with name XCLS++
which has not related problem XCLS+ and its efficiency are diagnosed more than XCLS+.
Keywords
Clustering, XML documents, XCLS+ algorithm, Content similarity of document, Structure similarity of
document
1. Introduction
The XML documents are currently devoted for exchanging the largest volume of textual
information on the internet. Interesting characteristics of XML, including ability to describe the
information (which makes information store in any computer), be its textual (been processed with
any operating system and software) and have been understood with machines and humans, have
caused particular popularity of this format in the IT community. On the other hand, because this
technology is relatively new, high-performance systems that enable to process it, are still being
developed. XML documents easy management and access to content with giving structure to
content. So because XML document is composed of content and structure therefore in processing
of XML documents should are considered these two parts.
With increased spread of information, for its storage, retrieval and transmission through internet,
we need to manage that information basically. Clustering is very useful in above processes and
for summarizing and reducing size of documents. Clustering XML documents also not an
exception and as many applications are faced with problem of clustering. For example, with
representative of each cluster obtained with an efficient clustering algorithm can very large
volume set of documents to store more briefly. Formula for calculating is the first appliances
required for clustering set of documents.
2. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
2
Methods to determine similarity two XML documents are generally divided into three categories:
content [10], structure [2] [5] [11] and combination of mentioned methods [1] [3] [4] [8] [9].
Clustering XML documents base on content is very similar to clustering textual documents which
a lot of works have been performed in this field. On the other words in this kind of clustering,
document tags omitted or considered as content, and clustered with planned algorithms easily. In
this way, criterion for clustering is content and if documents have content similarity then has been
laid in same cluster. Content based on clustering used for searching textual documents with
Google and AltaVista engines. Clustering base on structure tries to find documents which have
structure similarity and puts them in similar cluster. Combined clustering uses advantages two
types of content and structural clustering.
One of the methods of structural clustering XML document that has good efficiency is XCLS [5].
Performance of this method is to find similar tag name of nodes in levels of tree correspondence
to document and then calculating similar number. But fundamental problem XCLS is neglect
relationship FATHER-NODE that makes order of nodes is not preserved. XCLS+ method are
created for improving and solving the problem mentioned XCLS method [11]. XCLS+ method
has tried by adding a factor as relationship FATHER-NODE in XCLS related formula and
solved the problem and improved previous method. But with careful study have been observed
which the XCLS+ method because not considering duplicate nodes has problem and is way from
optimality. Therefore this article tries to improve similar formula calculation XCLS+ and a new
method called XCLS++ is established. In section2 tree model in XML documents will be
reviewed. Section3 discusses previous works and then in section4 detail analysis of the XCLS+
method will be expressed. In section5 the relative bug of XCLS+ method will be offered. The
proposed method will be offered in section6 for solving the bug XCLS+ method. Then in next
section, results of proposed method are presented and compared with the XCLS+ method and will
be fond that optimality of the proposed method is good in comparison with XCLS+ method. In
final section summary and conclusions will be brought.
2. Tree model of XML documents
As the XML documents are the tree format, so documents can to model in tree case. In this case,
problem of clustering the XML documents will decrease into clustering trees. The structural
methods benefit of these trees. The XCLS+ method that uses of tree model is an example which is
successful in clustering the XML documents. The XCLS+ method optimized the XCLS method.
Despite improvements with XCLS+ on XCLS, but XCLS+ has bug too. The main reason this
method which is not optimal (from now on we talk only about XCLS+), neglecting duplicate
nodes in the tree levels which in this article have been tried to fix it.
3. Previous works
Clustering criteria is to find similar documents. Similarity of documents should be found with a
certain similarity and clustering is done base on similarity. As mentioned above three methods for
finding similarities has been suggested and are: 1- content [10] 2- structure [2] [5] [11] 3- content
with structure [1] [3] [4] [8] [9]. As we know, each XML document can be changed to tree and
clustering operations can be done with these trees. The structural methods considered only the
tree structure and do not work with content. XCLS+ is the structural approach which in this
article has focused on this type clustering. As previous mentioned, this method have been created
to optimize the XCLS method. Experiments view that the XCLS+ method despite good
performance compared to the XCLS method has a basic problem that decrease efficiencies in
some circumstances. with more studies was concluded that main reason for low efficiency of this
method is to neglect duplicate similar nodes in trees which in this paper tried to present solution
3. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
3
could solve bug and improve the XCLS+ algorithm. More details in this article will be brought in
the next. In this paper, first XCLS+ have been studied then proposal algorithm offered. Finally
the results compared with proposed method and have been seen which optimality of the proposed
method is better than the XCLS+ method.
4. XCLS+ method
This method uses a similar structure and acts in more detail base on tag similarity between two
levels of tree nodes related to document tags. In this way the incoming XML document compared
with clustered documents and if there is acceptable similar number between those, new XML
document is placed beside relevant clustered document. Acceptable new incoming document
combined with document in cluster and form a new tree. In other words, in each cluster there is a
single tree. If similarity value is not acceptable in this case a new cluster has created and
incoming document is placed in it. This practice will continue until entrance final document. The
similarity calculation is done based on Formula1. This formula is created for improving the
XCLS method. In this way operations taken when document arrival and how execute it include:
first root node of incoming document has compared with root node of clustered documents then if
there are similar, factors the Formula1 is calculated and comparing levels nodes in each document
have been continued in a level down for similarity calculation. In the absence of similarity, the
next comparison is done between node in the clustered document in a level down and the same
node of the incoming document. This compares has been continued to the leaves levels of
document. As previously noted, similarity value have been calculated with:
sim base on XCLS
0.5 ∑ CN
CP
r
0.5 ∑ CN CP
r
∑ N!
! r! z 0.5 #∑ CP
r ∑ CP
r $
Formula1. The XCLS+ method based on formula
Numeric Value Formula1 is variable between zero and one. This value will change relatively on
similar.
The variables in the above formula are:
1- Z cluster size or in other words, the number of documents within cluster.
2- CNi
is the total of similar nodes between level i of the new incoming document and level j of
the clustered documents.
3- CNj
is the total of similar nodes between level j of the clustered documents and level i of the
new incoming document.
4- CPi
is the total of similar nodes between level i of the new incoming document and level j of
the clustered documents as have the same father.
5- CPj
is the total of similar nodes between in level j of the clustered documents and in level i of
the new incoming document as have the same father.
6- N in Nk
is the number of elements in level k of the incoming tree.
7- l is high of tree in the each document.
4. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
4
8- i, j are desired number of level.
9- r is the incremental factor, which is considered number 2.
10- k equal with 2.
The above formula is active in every jump between levels document and the similarity Value
calculated is sum of obtained factors. This practice continues until Level one of documents ended.
Overall algorithm the XCLS+ method is given in Section 4.1.
4.1. XCLS+ method algorithm
1- Start to look same in node two tree node. If a node is found then does the calculation of
formula then move to step2, otherwise go to Step3.
2- If depth of trees are move toward the lower level in the both trees. Search the same node as
step1. If there is same node, calculate the formula and repeat step2, otherwise go to step3.
3- If depth of tree is (usually in clustered document), move down level in the clustered
document and stay in the same level of the new incoming document. Search again the same
nodes. If there is the same node, calculate the formula And then repeat step2, otherwise repeat
step3.
4.2. Example of XCLS+ method
For understanding the algorithm an example is given in this section. Goal is to find similarity
between tree1 and tree2 base on the XCLS+ method. It should be noted that the tree1 referred to
the incoming document and the tree2 referred to the clustered documents. Arrows of up to down
are indicative the order of execution of algorithm. In this example in the Figure1 for facility the
variable values are calculated and placed on cut arrows
Figure1. An example for showing how operation of the XCLS+ method
After obtaining above factors, the similarity value base on the XCLS+method is:
sim base on XCLS
.*+,-.,,-/,-,--01,.*+,-.,,-/,-,--01
-/,2-0,.*-.,-/,--0
0.85
1
2 4
3
3
2
1
0
cn=1 cp=0
cn=2 cp=2
L= 0
L= 1
L= 2
tree1
cn=0 cp=0
tree2
5. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
5
5. Problem for XCLS+ method
With more detail study of algorithms and testing various examples observed that the XCLS+
formula has problem because neglecting repeated nodes. For more clarify presented an example
in the Figure2, 3.
Figure2. Same nodes of two trees are equal in hierarchies
After calculating above factors similar number is:
sim base on XCLS
.*∑ 34/
5 ,36/
5
78/
590 -7858/,.*∑ 34/
:
,36/
:
78/
590 -78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 36/
5 7858/,∑ 36/
:
78/
:90
78/
590 78:8/1
0.85
For another example in the Figure3 similar number calculated based on the XCLS+ method is:
sim base on XCLS
.*∑ =/
? ,=@/
?
78/
?90 -78?8/,.*∑ =/
A
,=@/
A
78/
?90 -78A8/
∑ B
78/
B90 -78B8/,.*+∑ =@/
?C78?8/,∑ =@/
A
78/
A90
78/
?90 C78A8/1
0.85
As a result calculated for the Figure2 also for trees in the Figure3 similar numbers are equal and
how calculation similar number for trees in the Figure3 is same similar number for trees in the
Figure2. Factors Figure3 for calculating similar number on arrows in Figure3 written. As see all
factors are same with factors in Figure2. Despite significant difference between In the Figure3, 2
have been seen which variable values are equal. So calculated similarity numbers with the
XCLS+ method is equal in Figure2, 3. By comparing two above examples, have been concluded
that similar numbers of first instance should be greater than second example. The results have
obtained with XCLS+ are away from logic and should be different. The primary cause of this
problem is ignoring repeated nodes in original formula with the XCLS+ method. Considering cp
factor regardless of repeated nodes causes order of modified nodes in levels changed and for
different trees, same similarity number be earned. So should method have been proposed which
repeated cases are also included. This paper proposed a method called XCLS++ which tries with
adding a new factor into the formula1 and solving problem the XCLS+ method in finding similar
number be more effective. In continuation proposed method is mentioned in this article.
1
2
4 5
6 5
4
2
3 2
0
1
0
cn=0 cp=0
cn=1 cp=1
cn=1 cp=0
cn=2 cp=2
L= 0
L= 3
L= 2
L= 1
tree2
tree1
6. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
6
Figure3. Same nodes of two trees are not equal in hierarchies
6. XCLS++: proposed method
As mentioned above the XCLS+ method has fundamental problem in calculating similarity
number. This method calculates equal similarity number for trees in the Figure2, 3. So should a
method have been proposed as their calculated similarity numbers be different. In this article tried
a method proposed for solving mentioned problem and gaining optimized results compared with
the XCLS+ method. In this article tried the problem will solve with two steps. Reason of unreal
calculation for same trees in Figure 2, 3 is duplicate nodes. This reason cases knocked hierarchy
of tree nodes. Therefore must factor added to formula which preserves order of nodes. In this
paper, in first step an incremental change are added to the original formula and then in second
step replacement change are apply. Finally the proposed formula will be optimized formula in this
paper. As will see final formula obtained is more reasonable and more realistic and have good
optimality in compare with the XCLS+ method. Steps are mentioned bring in continue.
6-1. step1
In step1 in New mentioned method for solving problem factor FATHER_CHILD_NODE added
in the formua1. This factor in proposed method formula is cc and proposed formula mentioned in
formula2 is:
sim base on XCLS
0.5 ∑ CN
CP
CC
r
0.5 ∑ CN CP
CC r
∑ DE
F
E GFE H 0.5 #∑ IJ
K
II
K
GFK ∑ IJ
L
II
L
F
L
F
K GFL$
formula2. Step1 stage formula of XCLS++ proposed method
In the above formula, all variables equal with variables in Formula1 and only cc is new factor.
New Factor represents number of same nodes that have same father and same children. In
neglecting this factor in ways that is XCLS+ causes to grow tree toward duplicate child node and
create unreasonable results. Considering above mentioned factor causes problem solved. Because
in this case growing tree of repeated node was determined and proportionate value of cc is placed.
For proving base on optimality of proposed method, similarity numbers for examples in
Figure2, 3 obtained again with proposed formula provided in formula2 and shown that proposed
method for two above mentioned examples which have significant difference, give better results
than the XCLS+ method. The similarity of trees Figure2 and Figure3 is calculated again
7. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
7
sequentially in Figures4, 5. Result are seen after related Figures. Note which factors without ()
belong for both trees.
Figure4.calculating similarity number for trees base on the step1 optimization
sim base on XCLS
.*∑ 34/
5 ,36/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,36/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 36/
5 ,33/
5 7858/,∑ 36/
:
,33/
:
78/
:90
78/
590 78:8/1
0.85
Figure5.calculating similarity number for trees base on the step1 optimization
sim base on XCLS
.*∑ 34/
5 ,36/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,36/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 36/
5 ,33/
5 7858/,∑ 36/
:
,33/
:
78/
:90
78/
590 78:8/1
0.83
With comparing the two observed numbers contrary the XCLS+ method, the proposed method
able to distinct difference between above mentioned cases. So factor cc able to solve problem
XCLS+. But with next examples have been seen which this factor has problem too and were not
solved problem completely. In continue mentioned examples will be brought.
8. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
8
6-2. problem for step1
Also step1 able to solve problem of xcls+ but for some examples are problem too. For example
for trees in figure6, 7 similarity numbers calculated with step1 are 0.96 , 0.95 sequentially. Result
are seen after related Figures.
Figure6.calculating similarity number for trees base on step1 optimization
sim base on XCLS
.*∑ 34/
5 ,36/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,36/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 36/
5 ,33/
5 7858/,∑ 36/
:
,33/
:
78/
:90
78/
590 78:8/1
0.96
Figure7.calculating similarity number for others trees base on the step1 optimization
sim base on XCLS
.*∑ 34/
5 ,36/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,36/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 36/
5 ,33/
5 7858/,∑ 36/
:
,33/
:
78/
:90
78/
590 78:8/1
0.95
As see two up examples show step1 only do not ables to solve xcls+ problem completely and for
tow group’s trees calculates unreasonable similarity numbers. So must changed another factor in
formula for earning optimal similarity numbers. This work will be brought in step2.
0
2 3
1
3
2
0
cn=1 cp=1 cc=0
cn=2 cp=2 cc=2
L= 0
L= 1
L= 2
tree1 tree2
1
1
cn(tree1)=2 cn(tree2)=1 cp(tree1)=2 cp(tree2)=1
cc=1
0
2 3
1
3
2
1
0
cn=1 cp=1 cc=0
cn(tree1)=2 cn(tree2)=1 cp(tree1)=2
cp(tree2)=1 cc=0
cn=2 cp=2 cc=2
L= 0
L= 1
L= 2
tree1 tree2
1
9. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
9
6-3. step2
For latest optimizing proposed method must a factor changed too. Factor which cases unreason
result is same father. In this step rather than it, factor same brother replaced. Latest optimized
formula is:
sim base on XCLS
0.5 ∑ CN
CB
N
CC
2N
0.5 ∑ CN CB
N
CC 2N
∑ N!
N
! 2N! 0.5 #∑ CB
CC
rN ∑ CB CC
N
N
rN $
formula3. Latest formula of XCLS++ proposed method
Now for trees in Figure6, 7 similarity numbers again calculated with formula3 in Figures8, 9.
Results of calculated are after related Figures:
Figure8.calculating similarity number for Figure6 trees base on step2 optimization
sim base on XCLS
.*∑ 34/
5 ,3P/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,3P/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 3P/
5 ,33/
5 7858/,∑ 3P/
:
,33/
:
78/
:90
78/
590 78:8/1
0.86
Figure9.calculating similarity number for Figure7 trees base on step2 optimization
0
2 3
1
3
2
0
cn=1 cb=0 cc=0
cn=2 cb=2 cc=2
L= 0
L= 1
tree1 tree2
1
1
cn(tree1)=2 cn(tree2)=1 cb(tree1)=2
cb(tree2)=0 cc=1
0
2 3
1
3
2
1
0
cn=1 cb=0 cc=0
cn(tree1)=2 cn(tree2)=1 cb(tree1)=2
cb(tree2)=0 cc=0
L= 0
L= 1
L=2
tree1 tree2
1
cn(tree1)=2 cn(tree2)=2 cb(tree1)=0
cb(tree2)=2 cc=2
L= 2
10. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
10
sim base on XCLS
.*∑ 34/
5 ,3P/
5
78/
590 ,33/
5 -7858/,.*∑ 34/
:
,3P/
:
78/
590 ,33/
:
-78:8/
∑ 4;
78/
;90 -78;8/,.*+∑ 3P/
5 ,33/
5 7858/,∑ 3P/
:
,33/
:
78/
:90
78/
590 78:8/1
0.94
As see results calculated with latest optimized formula are reasonable. For clarifying
effectiveness the above factors the XCLS++ and the XCLS+ algorithms implemented and have
been seen that how much this factor plays a fundamental role in applied environment. This work
will be taken in next sections.
7. Evaluating algorithms and comparing methods
As was shown in above examples the XCLS++ approach in comparison with the XCLS+ method
is good. For proving above sentence in this section the XCLS+ and XCLS++ algorithms have
been implemented. Both of them were implemented with C language in DOS environment on a
machine with 2.4 GHZ Intel Celeron CPU and 512 MB of RAM. The evolution criteria were
implemented too in same conditions for evaluating xml files in different types. The results of
experiments like above examples, confirm optimality of the proposed algorithm and efficiency of
the XCLS++ algorithm is higher than the XCLS+ method.
For evaluating similarity diagnostic algorithms and clustering XML documents, must first set of
the incoming files have been specified and after clustering accuracy of algorithms will be
calculated with existent criteria.
7.1. Set data
For evaluating, files divided in two categories for best analysis and both algorithms executed on
same files in same conditions for efficiency comparing. Two mentioned categories are
homogeneous files (from one type DTD) [6] and heterogeneous file (multi type of DTD) [7]. The
results for both categories will be shown separately.
7.2. Evaluation criteria
There are three items for calculating accuracy clustering algorithms: 1-entropy 2-purity 3-fscore
7.2.1 Entropy
Entropy is sum documents which located in the cluster i which are of the class r. The entropy
formula is:
QRSGTUV W
RK
N
E
K
QIK
XY
QIK
FZ[E
∑
5
]
^?
_T`
E
K
5
]
^?
In above formulas,IK, N, k, RK and n
are respectively ith cluster, total number of incoming
documents, number of clusters, number clustered documents in cluster i and number clustered
documents in cluster i of class r. The entropy value if be closer to zero is better and has good
efficiency.
11. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
11
7.2.2 Purity
Purity is sum maximum documents which located in the cluster i which are of the class r. The
purity formula is:
JaGbSV W
RK
N
E
K
JIK
XY
JIK
^?
maxn
The entropy value if be closer to one is better and has good efficiency.
7.2.3 Fscore
Fscore is another item created by combination of above two items and is:
defTGg
∑ RCdhC, IK
E
C
D
As
dhC, IK
JhC, IK GhC, IK
JhC, IK GhC, IK
2 n
RK RC
,
GhC, IK
n
RC
,
JhC, IK
n
RK
The fscore value if be closer to one is better and has good efficiency.
In this paper, after implementation algorithms and above criteria for both categories, results are
calculated and compared for analyzing. For testing implemented program, XML files consists of
100 different classes, such as medical files, colleges, shops, cars, insurance, etc... have been
considered. As above mentioned, input files divided into two parts and with the XCLS+ and
XCLS++ algorithms were evaluated separately. The results of algorithm on the homogeneous
files are in Table 1 and include:
12. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
12
Table1. Results of algorithms executed on homogeneous files
The results of algorithm on the heterogeneous files are in Table 2 and include:
Table2. Results of algorithms executed on heterogeneous files
As previous section results, the results obtained in this section shows which the XCLS++ method
has higher efficiency than the XCLS+ method. Also results show which proposed method has
good efficiency for homogeneous files in comparison with heterogeneous files. Because probably
for existence repeated node in homogeneous file is up.
8. Result and conclusion
The purpose of this paper is classification XML documents in order to expedite search and others
benefits that classification has them. Criteria for classifying is structural or content and structure
with content. The XCLS+ method is a method of classification methods which criteria for
classification is done base on structure. Despite good performance for the XCLS+ method in
compared with the XCLS method, ignoring repeated nodes in some documents causes it be
inefficient. Therefore, this paper proposed FATHER-NODE-CHAILD and BROTHER-NODE
factors to the XCLS+ formula for achieving good efficiency. The results proposed idea as
entropy, purity and Fscore show which the proposed method works better than the previous
method. In future weight of levels will be changed for obtaining similarity number actually and
better than proposed method too. Also the new method will be exam on much more documents in
future and with comparing those, will be able to obtain better results.
9. References
13. International Journal of Information Technology, Control and Automation (IJITCA) Vol.2, No.4, October 2012
13
[1] Ilwan Choi, Bongki Moon, Hyoung-Joo Kin, (2006) Data Knowledge Engineering, A clustering
method based on path similarities of XML data.
[2] Andrewdn, jag, (2007 (Information systems engineering, Evaluting Structural Similarity in XML
DocumentWISE’07 Proceedings of the 8th international conference on Web Information.
[3] Tien Tran, Richi, Peter, (2008 (Data Mining, Combining Structure and Content Similarities for
XML Document Clustering, Conference 27-28 November, Glenelg, South Australia.
[4] WOOSAENG KIM, (2008( Computer Engineering and Applications, XML document similarity
measure in terms of the structure and contents, CEA'08 Proceedings of the 2nd WSEAS
International Conference.
[5] G.R.Nayak, (2008) “Fast and effective clustering of XML data using structural information,“
knowl. Inf. Syst.
[6] The XML data repository. Accessed from: http://www.infomatik.uni-trire.de
[7] The XML data repository. Accessd from: http://www.cs.washington.edu
[8] Waraporn Viyanon, Sanjay K.Madria, Sourav S.Bhowmick, (2008) Management of Data,
XML Data Integration Based on Content and Structure Similarity Using Keys.
[9] aptarshi Ghosh and Pabitra Mitra, (2008) Pattern Recognition, ICPR Combining Content and
Similarity for XML Document Classification using Composite SVM Kernels, , 19th
International Conference.
[10] jing PengDong Qing Yang Shi Wei Tang et al, (2008) similarity in chinese text processing, A New
Similarity competing method based on concept, series F: Information science, 51(9): P1212-1230, 9.
[11] Mohamad Alishahi, Mohmoud Naghibzadeh and Baharak Shakeri Aski, (2010) International
Journal of Computer and Electrical Engineering ,Tag Name Structure-based Clustering of XML
Documents, VOL. 2, NO. 1, February.
10. Authors
Ahmad Khodayar received master science of computer engineering in 2010 from Islamic
Azad University of Shabestar, Iran. His current research area is data mining and specially
clustering; now he is working on a project about new way in clustering.
Hassan Naderi received his PhD degree in 2006 from INSA-LYON university of France.
His current research areas are text mining, search engine and massive data processing.