SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
229
A NOVEL APPROACH TOWARDS DEVELOPING A STATISTICAL
DEPENDENT AND RANKING MEASURE FOR KEYWORD SEARCH
OVER XML DATA
Dayananda P1
, Dr. Rajashree Shettar 2
1Assistant Professor, Department of Information Science and Engg, MSRIT, Bangalore-54
2
Professor, Department of Computer Science and Engg, RVCE, Bangalore-59
ABSTRACT
Extensible Markup Language (XML) defines a set of conventions for representing the
encrypted documents in both human-readable and machine-readable format. XML is widely
used to represent the arbitrary data structure. Since XML is being largely accepted as a
standard for data representation, it is mostly preferred markup language to support keyword
search. In this paper, a statistical dependent and ranking measure for keyword search over
XML data is proposed. The proposed method consists of the following steps such as: 1)
Indexing, 2) Selecting the exact T-type node, 3) Data search and Ranking of search results. A
T-type node is considered as a desired node to searched, if XML node contains informative
enough with relevant information and node type T should relate to every keyword in query.
First the input XML data is given to indexing process that converts the XML data into the
indexed format to make search easier. Then, the corresponding T-type node is selected
through our proposed statistical dependent formulae. Once selection of T-type node, the
relevant data is obtained based on sorting the node type paths. Finally, ranking is done based
on the search results obtained from the previous steps with our designed ranking measure.
This work of ours addresses the two challenges addressed by TF*IDF strategy and improve
the effectiveness of the search for node type and ranking of search results.
Keywords: XML Keyword search, Indexing, search for node type, Data search and Ranking
Measure.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 229-247
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
230
1. INTRODUCTION
For big amounts of information, Internet is the depository space. The sharing of XML
information quantity over the World Wide Web is expanding severely. The text-centric XML
document collections are now obtaining more and more common, as the big majority of this
XML data is data-centric. As an effect, it became useful to give means to control these
collections. Using document-clustering methods this can be done by automatically arranging
very big collections into smaller sub-collections. Unluckily, the majority of the research on
structured document processing [1] and [3] is still focused on data-centric XML. With the
major difficulty in this area being the need to optimally index them for storage and retrieval
purposes, the Processing and management of XML documents [4] have already become
popular research issues. There have been several searching methods grown up in the IR
research community that basically depend on a set of weighted keywords in a search query to
decide the proximity of the query and a document in the feature space. However, the finding
of XML documents goes away from the conventional data retrieval strategy, which means
that the XML documents have nested XML elements and semantics of information values
indicated by tags. As an effect, in XML searching, the notion of keyword proximity utilized
in IR [13] is too simple to be effective.
To enquire XML documents the Keyword search is a handy way, since it permits
users to easily issue keyword queries without the knowledge of complex query languages or
the structure of underlying information. The keyword proximity search is focused on by
majority of the research efforts in XML keyword search in either tree model or general
digraph model. The two approaches commonly suppose a smaller sub-structure of the XML
document which consists of all query keywords indicates a better effect. Smallest Lowest
Common Ancestor (SLCA) is a simple and effective semantics in tree model for XML
keyword proximity search [15, 8]. Every SLCA result of a keyword query is a smallest XML
node that 1) covers all keywords in its descendants and 2) has no single proper descendant to
cover all query keywords. Based on tree model, however, the SLCA semantics does not catch
ID reference data that is generally available and significant in XML data-bases. It may, as an
effect, return a large tree consisting of irrelevant data. XML documents, on the other hand
may be modeled as digraphs to take into account ID reference edges. The main concept in
digraph model, which finds for minimal connected sub trees in graph, is called reduced sub
trees [14]. However, the difficulty of searching all reduced sub trees and enumerating effects
by rising sizes of reduced sub trees is NP-hard [17, 10].
The heuristics are dependent on by current XML keyword and natural language query
answering approaches that suppose certain properties of the DB schema. Though these
heuristics are intuitively logical, even in the highest-quality XML schemas, they are enough
ad hoc that they are often violated in practice. Thus present approaches endure from low
precision, low recall, or both [19]. Now the concern is turning to queries of the end-user
effectiveness of such search systems. To the new domain, the Traditional IR similarity
metrics have been ported and combined with domain-specific structural features. Both
through developing new methods and tuning existing ones, there is also proof of significant
improvements in effectiveness [20].
Motivation of our research is to design and develop a technique for keyword search
over XML data. The work presented in [10] over the XML search technique is our real
motivation, in which they have used TF*IDF strategy by addressing two challenges. When
analyzing the existing work [10], finding the term frequency-based score computation was
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
231
not much impressive in selecting the exact T-type node. Incorporating some other features
along with frequency can lead to effective T-type search in XML data. Searching output for a
user is significantly high, the ranking of search result is more important. This problem can be
solved easily by putting the effective ranking mechanism.
The above mentioned two challenges will be solved using the proposed methodology
along this; work addresses the effectiveness and efficiency in term of result relevance by
addressing the challenges addressed in [10] such as identifying the users search intention,
resolving the keyword ambiguity issues and effective ranking of the search results. The
proposed method consists of the following steps such as;
1) Indexing: The input XML data is given to indexing process that converts the XML data
into the two indices (data index and node index) which will make search easier.
2) Selecting the exact T-type node: The corresponding T-type nodes will be selected
through our designed statistical dependent formulae such as Dscore and Tscore .
3) Data search and Ranking of search results: Once selection of T-type nodes, the relevant
data are obtained based on the sorting the node type paths. Finally, ranking will be done
based on the search results obtained from the previous steps with our designed ranking
measure using correlation measure.
The rest of the paper is organized as follows. The literature of keyword search over
XML data is presented in Section 2, and proposed research methodology in Section 3. In
Section 4 the proposed method is discussed, while the Results and Experiments are discussed
in Section 5. The conclusion is done in Section 6.
2. RELATED WORK
JianhuaFeng and GuoliangLiet al in [5] presented a fuzzy type-ahead search in XML
data, their information-access paradigm in which the system searches XML data on the fly as
the user types in query keywords. It allows users to explore data as they type, even in the
presence of minor errors of their keywords. Their approach had the following features: 1)
Search as you type: It extended Auto complete by supporting queries with multiple keywords
in XML data. 2) Fuzzy: It could find high-quality answers that have keywords matching
query keywords approximately. 3) Efficient: effective index structures and searching
algorithms can achieve a very high interactive speed. They presented an effective index
structures and top-k algorithms to achieve a high interactive speed. Also, they examined
effective ranking functions and early termination techniques to progressively identify the top-
k relevant answers. And their implementation results achieved high search efficiency and
result quality.
Wei Waet al in [6] presented a multidimensional search approach that allows users to
perform fuzzy searches for structure and metadata conditions in addition to keyword
conditions. Their techniques individually score each dimension and integrate the three
dimension scores into a meaningful unified score. They also have designed indexes and
algorithms to efficiently identify the most relevant files that match multidimensional queries.
Experimental evaluation of their approach showed that their relaxation and scoring
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
232
framework for fuzzy query conditions in non content dimensions can significantly improve
ranking accuracy.
Ziyang Liu et al in [7] presented an XML search engine Target Search that addresses
an open problem in XML keyword search: given relevant matches to keywords, how to
compose query results properly so that they could be effectively ranked and easily digested
by users. Intuitively, each query had a search target and each result should contain exactly
one instance of the search target along with its evidence. They have developed Target Search
which composes atomic and intact query results driven by users search targets.
ChunxiaoLiuetalin [8] presented a user-friendly Top-k keywords searching approach
based on the relationship of keywords. The SLCA of a keyword search was first obtained by
the LISA II algorithm. Then, the structure of SLCA was leveraged to speculate the
relationship of keywords, i.e., the keyword search was translated into twig queries. Next, the
relationship of keywords could be estimated by the structure of twig queries and these twig
queries were ranked according to the relationships of keywords. Finally, all results of the
ordered twig queries were obtained by TJFast algorithm.
Yiqun Chen and Jinyin Cao in [9] have presented an approach to type-ahead keyword
searched in XML data, call Take XIR. The IR-style approach basically utilized the statistics
of underlying XML data to address that the following challenges in XML IR system: (1)
identify the user search intention, i.e. identify the keywords to express user interests and
identify nodes user wanted to search for and search via. (2) Resolve keyword ambiguity
problems: synonyms and polysemy exist in natural language, and a keyword could appear as
the text values or tag value of different XML node and carry different meanings. They have
modeled XML data as a graph, analyzed the identification of user search intention and result
ranking in the presence of keyword ambiguities and used the related definition and formula to
build a query prediction technique to improved search efficiency.
Jiang Li and Junhu Wang [11] have presented an XML keyword search provided a
simple and user-friendly way of retrieved data from XML databases, but the ambiguities of
keywords make it difficult to effectively answer keyword queries. XReal utilized the statistics
of underlying data to resolved keyword ambiguity problems. However, they found their
presented formula for inferring the search-for node type suffers from inconsistency and
abnormality problems. Finally a dynamic reduction factor schemes as well as an algorithm
Dynamic Infer to resolve these two problems. Experimental results are shown provided to
verify the effectiveness.
Liang Jeff Chen and YannisPapakonstantinouin[12] have presented a series of
algorithm that incorporated both the efficient semantic pruning and the top-K processing to
support top-K keyword search[23]. They presented a join-based algorithm that processes
nodes bottom up and reduced keyword query evaluated into relational joins. Several
optimizations were presented to further improve its efficiency. They then incorporated the
idea of the top-K join from relational databases and presented a join-based top-K algorithm to
computed top K results. Extensive experimental results confirmed the advantages of
algorithms over previous algorithms in both efficiency and top-K processing.
ZhifengBaoetalin [10] have studied the problem of effective XML keyword search
which included the identification of user search intention and result ranking in the presence of
keyword ambiguities. They utilized statistics to infer user search intention and rank the query
results. In particular, they have defined XML TF and XML DF, based on which have been
designed formulae to computed the confidence level of each candidate node type to be a
search for/search via node, and further proposed XML TF*IDF similarity ranking scheme to
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
233
captured the hierarchical structure of XML data. Finally, the popularity of a query result
(captured by ID Ref relationships) was considered to handle the case that multiple results
have comparable relevance scores.
As an extension of [10], several major updates in terms of: 1)our ranking framework
uses the correlation concept considered in section 4, which outperforms the ranking concepts
in[10], 2) Selecting the exact T-type node into consideration in section 4, 3) New index and
algorithm are designed in section 4.
3. RESEARCH METHODOLOGY
Definition 3.1(Structural Node) A tag name is used to label XML node called a structural
node. Internal node is defined as children’s of structural node; otherwise, it is called a leaf
node.
Definition 3.2(T type node) A T type node is considered as a desired search for node if, T
type node is intuitively related to every query keyword, XML nodes of T type should be
informative enough to contain enough relevant information and XML nodes of type T should
be not overwhelming to contain too much irrelevant information .
Definition 3.2 (Data Node) the leaf node of XML data containing text values and have no
tag name is called as data node.
The primary intention of our research is to design and develop a technique for
keyword search over XML data. The real motivation of the work is come out from the XML
search technique given in [10], in which they have used TF*IDF strategy by addressing two
challenges. When analyzing the existing work [10], the finding is that term frequency-based
score computation was not much impressive in selecting the exact T-type node. Incorporating
some other features along with frequency can lead to effective T-type search in XML data.
Also, the ranking of the search results is important for the users if search output is
significantly high. This problem can be solved easily by putting the effective ranking
mechanism.
The above mentioned two challenges will be solved using the proposed methodology.
The proposed method consists of the three major steps such as, 1) Indexing, 2) Selecting the
exact T-type node, 3) Data search and Ranking of search results. At first, the input XML data
is given to indexing process that converts the XML data into the indexed format to make
search easier. Then, the corresponding T-type nodes are selected through our designed
statistical dependent formulae. Once we select T-type nodes, the relevant data are obtained
based on the similarity matching with the input query. Finally, ranking will be done based on
the search results obtained from the previous steps with our designed ranking measure. The
proposed algorithm will be implemented using JAVA and the performance of the algorithm
will be compared with existing algorithm in terms of precision, recall and ranking measure
with two different datasets.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
234
4. PROPOSED METHOD
1. Indexing
The approach presented in [10] for Data processing, built two indices viz. keyword
inverted list and frequency table. Of these indices, the keyword inverted list retrieves a list of
data nodes in document order whose values contain the input keyword. For each inverted list,
an index viz. B+-Tree is built on top of it. The second index built, called frequency table,
stores only the frequency (number of T-typed nodes that contain keyword k in their subtrees
in XML data) for each combination of keyword k and node type T in XML document. If a
query keyword is searched, the approach presented in [10] doesn’t identify the keyword as
node or data and this leads to more complex query processing.
There by, to overcome the above discussed demerits, a specific indexing method is
proposed that builds two indices viz. Nodeindex and Data index for structural nodes and data
nodes respectively. These two indices are represented in Table 1 and Table 2 for DBLP XML
document. In contrast to the indices presented in[10], the proposed approach stores node
name of each structural node, frequency of occurrence of each structural node either in T-
typed nodes or their subtrees, prefix path of the corresponding T-typed nodes in the node
index and name of data nodes. Corresponding node names and frequency of occurrences of
each data node in XML document is stored in data index. The data node information table is
dependent on the Node index in relation with the node name. Scores with reference to the two
indices is utilized efficiently to determine the exact T-typed node for a given keyword query.
Thus, the proposed indexing approach addresses each node and data separately in XML
database and results in effective query processing. The fig 1 shows the partial structure of
DBLP XML database and Fig 2 shows partial data subtree for DBLP XML database.
Fig.1. Partial data tree structure for ‘DBLP’ XML database
pages
416-440
book title
year
1986
dblp
inproceedings
phdthesisarticlemastersthesis
author
title
year
school
Tolga
Yurek
“Efficient
view
maintenance
at data
warehouses
”
1997 “University
of California
at santa
Barbara,
department
of computer
science”
ee author cdrom
“GTE/
MAN0
95 pdf”
“Frank
Manol
a”
“db/labs/
gte/TR-
0310-11-
95-
165.html
”
author
title school
year
“AndraSi
keler”
“impleme
ntierungs
konzeptef
uuml; r
Non-
standard-
Datenban
ksysteme.
”
1989
“Universitauml; t
kaiserslavtern”
author title url
“Eike
Best”
“COSY:
Its
Relation
to Nets
and
CSP.”
“db/c
onf/a
c/petr
i86-
2.htm
l#Bes
t86”
month
“November”
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
235
Sr. no. Node Frequency Path
300 author 212898 dblp,article
302 url 106805 dblp,article
303 publisher 4 dblp,article
307 year 72 dblp,phdthesis
311 publisher 3 dblp,phdthesis
319 author 14 dblp,www
320 editor 21 dblp,www
321 booktitle 1 dblp,www
324 title 2609 dblp,proceedings
326 series 1955 dblp,proceedings
Table 1: Node index
Table 2: Data index
3. SEARCH FOR NODE TYPE-T
For selection of exact T- type node for a given keyword query, the keyword matching
tag may occur many times in different T-typenode and their subtrees. Thus, causing search
for node type process to be more complex. In order to overcome this drawback, we have
proposed a couple of mathematical scores such that the optimal T-type nodes are selected.
The proposed mathematical scores which addresses the complexity issue are viz; 1) Dscore and
2) Tscore. Where, Dscoreis the ratio of the depth of the ancestor nodes from the keywords in a
given query and Tscore gives the percentage score of each node type having the best depth
score (Dscore).
a) Dscore
For a given input Qurery ‘q’, initially the depth of the Lowest common
ancestor(LCA) node from all the keywords in the query, as well the depth of the Highest
common ancestor(HCA) node for the same keywords are computed. Therefore, the ratio of
the depth of the ancestor nodes from the keywords in a given query is known as the Dscore.
Sr. no. Data Node Frequency
30 db/labs/gte/index.html#TR-0169-12-91-165 url 1
32 db/labs/gte/TR-0231-08-93-165.html ee 1
33 Sandra Heiler author 7
35 TR-0231-08-93-165 volume 8
36 1993 year 4144
38 GTE/MANO93c.pdf cdrom 1
42 June month 5
44 db/labs/gte/index.html#TM-0014-06-88-165 url 1
45 GTE/MANO88.pdf cdrom 1
46 db/labs/gte/TM-0332-11-90-165.html ee 1
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
236
Month
Fig 2. Partial data sub tree Structure for ‘DBLP’ XML database






nodeHCAofdepth
nodeLCAofdepth
=D score (1)
The LCA nodes with the lowest set of Dscore values are selected as the probable node
type for the given Query ‘q’. From these set of likely Dscore values the best node will be
selected as the T-type node for given Query keywords. To do so, a Tscore percentage is
estimated.
b) Tscore
Tscore percentage is estimated by defining the score as for a keyword query, what is the
chance of occurrence of keyword ‘k’ at that node type-T. This can be identified by
conditional probability property. The conditional probability states that, if ‘q’ and ‘T’ are the
events respectively, then it is said to be the probability of ‘q’ given ‘T’ and it is denoted by P
(q/T).
Therefore, the conditional probability with respect to the above definition and notations is
expressed as;
( )
( )TP
TqP
=
T
q
P
I





 (2)
Where;
P(q/T) is defined as the chance of event ‘q’ when event ‘T’ have occurred, P(q n T) is
the occurrence of event ‘q’ in event ‘T’, P(T) is defined as the probability of occurrence of
event ‘T’.
dblp
Article
“November”
ee Author cdrom
“GTE/MAN095
pdf”
“Frank
Manola”
“db/labs/gte/TR-
0310-11-95-
165.html”
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
237
Now with reference to the mathematical derivation of the conditional probability
(P(q/T)), say probability of ‘q’ given ‘T’. Equation in (2) can be represented the sum of the
probability of occurrence of the keyword at that node type-T.
( )
∑∈












Tqk P(T)
P(k)
=
T
q
P
I
(3)



×











∑∈
P(k)
P(T)
1
=
T
q
P
T)(qk I
P (T) is constant for no of keywords (‘k’=1 to n) in the query
(4)
)(
1
P(k)=
T
q
P
n
1k TP
=



×





∑=
αα (5)
Thus, to estimate the best T-node type the percentage of frequency of occurrence of
‘k’ at that node type is very important and hence it is considered as the Tscore% of a particular
node and the node having highest Tscore% is the relevant type node and is defined as-
Therefore, ∑=
×
n
k 1
score P(k)=T α (6)
But, P (k) can also be defined as the frequency of occurrence of ‘k’ at that node type
‘T’ and P (T) can also be defined as the frequency of the node type-T. And hence defined in
equation (6) as;
)(
1
,f(k)=T
1
score
Tf
for
n
k
=



× ∑=
αα (7)
Thus the Tscorepercentage is defined as,
100f(k)=T
1
score% ×× ∑=
n
k
α (8)
The percentage score of the optimal node type Tscore% is thus defined as, the
percentage of frequency of occurrence of keywords in the query at a particular node type with
respect to the frequency of occurrence of that node type defined in equation(8).
4. DATA SEARCH AND RANKING
For a input keyword query containing ‘n’ keywords. Based on proposed indexing
techniques after pre-processing the XML document, we extract two different indices for each
keyword in the Query. These indices are viz; data index and node index. Data index is the
one having its frequency and node type information whereas; Node index is the one having
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
238
its frequency and path information. The proposed XML keyword search is carried out in
following steps:
1. It identifies the search intent of the user. To identify the desired search for node type
we initially estimate the Dscore of the LCA nodes in the XML document using
equation (1) and choose those nodes having leastDscore.
2. Then for each node type having a valid Dscore, we evaluate its Tscore% by using
equation (8) and choose the optimal or maximum Tscore% as the best search for node
type.
3. With respect to the desired or relevant search for node type-T computed form valid
Tscore% the prefix paths for the node type are sorted. Then the sorted prefix paths of the
search for node type is Ranked by defining the correlation between the sorted paths.
Algorithm 1:
Input: Query; Node_index; Data_index;
Keyword Matching= index( )
{
Query="q";
if (q = node & Node index!=null)
for(Node_indexlength)
{
q = keyword[Node_index];
f= get_nodefrequency(query);
}
Else if(q = data &Data_index!=null)
for(Data_indexlength)
{
q = keyword[Data_index];
f= get_datafrequency(q);
}
}
// search for node type//
Score = get_Dscore( )
{
if (Dscore( ) = min) then
get_Tscore()
node_type = max[Tscore( )]
}
//Ranking//
Rank = get_corr( )
{
if (sum_corr( ) = max) then
Ry = max[sum_corr( )]
Check threshold()
{
if difference (Rank1-Rank2)<Threshold
then select lowest Tscore
else Rank1.
}
}
In algorithm 1, function get_nodefrequency will calculate the frequency of T type nodes
containing all the query keywords and function get_datafrequency will retrieves the number
of data node present under an each T-type node. Dscore retrieves the list of path with lowest
Dscore value and it is based on output of Dcore function, the path is selected with highest
Tscore. Finally ranking is done through get_corr function, by finding correlation between all
paths.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
239
Generally, any statistical relationship between two random variables or two sets of
data is referred to as Dependence. And any of a broad class of statistical relationships
involving dependence is referred to as Correlation. There are several correlation coefficients
measuring the degree of correlation. The most commonly preferred is Pearson’s correlation
coefficient. Pearson’s correlation is obtained by dividing the covariance of the two variables
by the product of their standard deviations. Since we have series of n sorted paths of say X &
Y written as Xi& Yi where i=1, 2… n. thus the sample correlation coefficient is used to
estimate the population pearson correlation ‘r’ between X & Y. The sample correlation
coefficient for Ranking is written as;
∑
∑ ∑
=
= =






×
n
1i
1i 1i
2
i
2
i
ii
xy
)y'-(y)x'-(x
)]y'-)(yx'-[(x
=r
n n
(9)
∑= 















×





×
n
1i
i
x
i
xy
)y'-(y
S
)x'-(x
1)-(n
1
=r
yS
(10)
x
i
S
)x'-(x
Is the standard score, the equation above can be corrected for a sample X’ is
the sample mean and sx is the sample standard deviation given in equation 9 & 10.After
determining the correlation for each combination of paths for the search for node type, the
sum of the correlation of a path with itself and the other paths related to the node type will
rank the node type path.
Correlation map
X
Y
P1 P2 P3 P4 P5
P1 Corr(P1,P1) Corr(P1,P2) Corr(P1,P3) Corr(P1,P4) Corr(P1,P5)
P2 Corr(P2,P1) Corr(P2,P2) Corr(P2,P3) Corr(P2,P4) Corr(P2,P5)
P3 Corr(P3,P1) Corr(P3,P2) Corr(P3,P3) Corr(P3,P4) Corr(P3,P5)
P4 Corr(P4,P1) Corr(P4,P2) Corr(P4,P3) Corr(P4,P4) Corr(P4,P5)
P5 Corr(P5,P1) Corr(P5,P2) Corr(P5,P3) Corr(P5,P4) Corr(P5,P5)
Rank Σx=1to5corr(Px,P1) Σx=1to5corr(Px,P2) Σx=1to5corr(Px,P3) Σx=1to5corr(Px,P4) Σx=1to5corr(Px,P5)
Therefore from the correlation map it is observed that the correlation each pair of path
addresses the ranking effectiveness. The ranking is defined as;
∑=
5
1x
yxy )P,corr(P=R (11)
The Path of the search for node type having the ‘Ry’ value with the highest sum is
ranked as the best search intention given in equation 11, if the difference of the first to ranked
correlation sum of the paths is greater than or equal to the threshold value, else if the
difference is less than the threshold then the lowest Tscore% is selected as the desired search
for node type, as given in equation 12.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
240
Rank1.(Rank)maxRelsif
Tscore%lowestistypenodeRthen
thresholdRank2)-iff(Rank1=R
==
=
<d
(12)
5. RESULTS AND COMPARISON
Our proposed statistical dependent and ranking measure for keyword search over XML
data was experimented by implementing our approach using JAVA software (jdk-1.6 version) on
3.20GHz Intel(R) Pentium(R) D, 1.00GB RAM, and 32-bit operating system with windows 7
professional. The experimental results obtained are tabulated and these results are compared with
the existing method XReal. The results generated and compared are tested for the real datasets;
viz., DBLP, WSU, and eBay [10, 2], and are further discussed in terms of effectiveness and
efficiency.
Effectiveness test: This type contains two tests viz., 1.1) Inferring the desired search for
node type and 1.2) Quality measure using metrics= Precision, Recall and F-measure.
Efficiency test: This type of test is evaluated by measure of Query response time of the proposed
method with the XReal for all three real datasets.
Note: Query under test
Notation Query
DBLP dataset
QD1 “Java book”
QD2 “author Chen Lei”
QD3 “Jim Gray article”
QD4 “XML twig”
QD5 “Ling tokwang twig”
QD6 “vldb 2000”
QD7 “Philip Bernstein”
QD8 “WISE”
QD9 “ER 2005”
QD10 “LATIN 2006”
WSU dataset
QW1 “230”
QW2 “CAC 101”
QW3 “ECON”
QW4 “Biology”
QW5 “place TODD”
QW6 “days TU TH”
eBay dataset
QE1 “2 days”
QE2 “cpu 933”
QE3 “Hard drive CA”
5.1 Effectiveness test
The effectiveness of our approach for a statistical dependent and ranking measure for
keyword search over XML data is addressed by identifying the user search intention and
resolving the ambiguity issues. The accuracy of our approach is tested by evaluating the user
search intention for the search for node type for the query tabulated in the table 3 of which couple
of query having both ambiguity 1 and 2 and few having ambiguity 2 are considered.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
241
5.1.1 Inferring the desired search for node type
The queries used in table 3, such as QD1 and QD3 have both ambiguity 1(keyword
appearas an XML tag name and text value) and ambiguity 2(keyword appear as text values of
different type of XML nodes) whereas QD2, QD6 and QW1 have ambiguity 2. The user
search intention, if observed from the table 3 for DBLP dataset is ideal for our method and
XReal approach compared to the SLCA/XSeek. While for the WSU and eBay dataset the
search intention is almost able to infer a desired search for node type as these datasets are of
small size and the root node occurs alongside the search intention. For example in case of
Query QE1 search intention is auction_info and our approach outputs auction _info; listing.
Example for desired Search for node type using our proposed method is as follows;
We consider a Query for which the complete Search for node type is presented.
Input Query: “java book”
==========================================
1) Dscore
Tag frequency path Dscore
author 413010 dblp,inproceedings 1.0
author 212898 dblp,article 1.0
title 179060 dblp,inproceedings 1.0
url 179058 dblp,inproceedings 1.0
booktitle 179058 dblp,inproceedings 1.0
title 106834 dblp,article 1.0
url 106805 dblp,article 1.0
ee 73560 dblp,inproceedings 1.0
ee 23442 dblp,article 1.0
title 2609 dblp,proceedings 1.0
url 2491 dblp,proceedings 1.0
booktitle 2293 dblp,proceedings 1.0
author 1996 dblp,incollection 1.0
author 1153 dblp,book 1.0
title 1009 dblp,incollection 1.0
booktitle 1009 dblp,incollection 1.0
url 1006 dblp,incollection 1.0
title 845 dblp,book 1.0
book 845 dblp,book 1.0
url 128 dblp,book 1.0
ee 107 dblp,incollection 1.0
title 72 dblp,phdthesis 1.0
author 72 dblp,phdthesis 1.0
url 38 dblp,www 1.0
title 38 dblp,www 1.0
author 14 dblp,www 1.0
ee 6 dblp,proceedings 1.0
title 5 dblp,mastersthesis 1.0
ee 5 dblp,book 1.0
author 5 dblp,mastersthesis 1.0
ee 1 dblp,phdthesis 1.0
booktitle 1 dblp,www 1.0
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
242
2) Tscore
Tag Name Tscore path
booktitle 182361.0 dblp,www
author 125829.6 dblp,mastersthesis
ee 97121.0 dblp,phdthesis
title 58094.4 dblp,mastersthesis
author 44939.142857142855 dblp,www
ee 19424.2 dblp,book
ee 16186.833333333332 dblp,proceedings
author 8738.166666666666 dblp,phdthesis
title 7644.0 dblp,www
url 7619.105263157894 dblp,www
title 4034.333333333333 dblp,phdthesis
url 2261.921875 dblp,book
ee 907.6728971962616 dblp,incollection
author 545.661751951431 dblp,book
title 343.75384615384615 dblp,book
author 315.2044088176352 dblp,incollection
title 287.8810703666997 dblp,incollection
url 287.79920477137176 dblp,incollection
booktitle 180.73439048562932 dblp,incollection
url 116.228823765556 dblp,proceedings
title 111.33461096205444 dblp,proceedings
booktitle 79.52943741822939 dblp,proceedings
ee 4.143033870830134 dblp,article
author 2.9551616266944736 dblp,article
title 2.7189097103918227 dblp,article
url 2.710790693319601 dblp,article
title 1.6222048475371385 dblp,inproceedings
url 1.6169397625350446 dblp,inproceedings
author 1.5233238904627007 dblp,inproceedings
ee 1.3202963567156063 dblp,inproceedings
booktitle 1.0184465368763194 dblp,inproceedings
book 0.0 dblp,book
=================================================
3) Ranking
Example: correlation of dblp,proceedings and dblp,incollection
corr(dblp,proceedingsdblp,incollection)= 0.1221784083384564
Ranked Sum of correlation:
Path Rank
P1=dblp,book 3.2727014742218543
P2=dblp,phdthesis 3.1869696826431175
P3=dblp,incollection 3.0431260287060002
P4=dblp,www 2.0916351992181195
P5=dblp,article 1.8924147256281627
P6=dblp,inproceedings 1.8924147256281627
P7=dblp,proceedings 0.13822919060961375
P8=dblp,mastersthesis 0.0
Rank1.(Rank)maxRelsif
Tscore%lowestistypenodeRthen
thresholdRank2)-iff(Rank1=R
==
=
<d
Selected Path is dblp, book
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
243
Table 3. Effectiveness test on Inferring the desired search for node type
Query Intention XReal SLCA/XSeek Our
DBLP (370MB)
QD1
Java,
book
book book
book ; title/
book; article
book
QD2
author,
Chen, Lei
inproceedings inproceedings author
inproceedings
QD3
Jim,
Gray,
article
article article article
article
QD4
XML,
twig
inproceedings inproceedings
title/
inproceedings
inproceedings
QD5
Ling, tok,
wang,
twig
inproceedings inproceedings Inproceedings
inproceedings
QD6
vldb,
2000
inproceedings inproceedings inproceedings
inproceedings
WSU (16.5MB)
QW1 230 place course;place
room; crs /
course
Place;course
QW2
CAC,
101
course course course
Course
QW3 ECON course course prefix/course Course
QW4 Biology course course title/course course
QW5
place,
TODD
course course place/course
Place;course
QW6
days, TU,
TH
course course days/course
Place
eBay (0.36MB)
QE1 2 , days auction_info listing
time_left /
listing
auction_info;listing
QE2 cpu, 933 listing listing cpu / listing Item_info;listing
QE3
Hard,
drive, CA
listing listing
description /
listing`
listing
5.1.2 Quality measure (Precision, Recall & F-measure)
Quality measure is also addresses the effectiveness of our approach by evaluating all
the queries under test, and sums up few metrics viz; precision, recall and F-measure.
Precision is the percentage measure of, the output subtrees that are desired; recall is the
percentage measure of the desired subtrees that are output; while F-measure is the weighted
mean value of precision and recall. Because most of the queries on DBLP have more than
100 results, therefore, in [10] precision, recall and F- measure are XReal’s. Similarly, for
each query issued on WSU and eBay, thus in figure 3 and 4.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
244
(a) (b)
(c)
Fig. 3. Precision comparison (percent) (a) DBLP (b) WSU and (c) EBAY
(a) (b)
(c)
Fig. 4. Recall comparison (percent). (a) DBLP, (b) WSU, and (c) EBAY
0
10
20
30
40
50
60
70
80
90
100
X Real
Proposed
0
10
20
30
40
50
60
70
80
90
100
QW1 QW2 QW3 QW4 QW5 QW6
X Real
Proposed
0
10
20
30
40
50
60
70
80
90
100
QE1 QE2 QE3
X Real
Proposed
80
82
84
86
88
90
92
94
96
98
100
X Real
0
10
20
30
40
50
60
70
80
90
100
QW1 QW2 QW3 QW4 QW5 QW6
X Real
Proposed
0
10
20
30
40
50
60
70
80
90
100
QE1 QE2 QE3
X Real
Proposed
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
245
Table 4: F-Measure (%)
Method
Dataset
XReal Proposed
DBLP 47.48 48.48
WSU 49.67 37.5
EBAY 40.02 44.44
Figure 3 represents that the Average precision for our proposed approach is effective than
the XReal for the queries in the DBLP dataset. Figure 4 represents the Recall measure for all
three real datasets and the recall measure for our approach out performs XReal. Further, F-
measure is measured adopting formula F = [(precision * recall)/ (precision + recall)] to get F-
measure in Table 4. This can be measured as the average precision and recall score of all the
queries under test. F-measure for our method in the DBLP dataset is 48.48% and Ebay is 44.44%
whereas; for XReal in DBLP it is 47.48 % and in Ebay it is 40.02%.
5.2 Efficiency test
The efficiency test is addressed by evaluating the query response time adopting our
proposed method designing the indices for keyword information discussed in section 4. This is
executed by measuring the time taken to search for the node type of the given query. The
response time of individual queries under test is represented in Table 4. Proposed method is
compared with the XReal Dup type norm. In case of DBLP,WSU and ebay real dataset it is
observed that our approach is faster than even Dup type norm (three level information indexing).
Fig. 5 shows the response time in seconds on individual queries DBLP, WSU and eBay
databases.
(a) (b)
(c)
Fig. 5. Response time on individual queries (a) DBLP (b) WSU and (c) eBay
0
2
4
6
8
10
12
QD1 QD2 QD3 QD4 QD5 QD6
DupTypeNorm
Proposed method
Time(s)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
QW1 QW2 QW3 QW4 QW5 QW6
DupTypeNorm
Proposed method
Time(s)
0
1
2
3
4
5
6
QE1 QE2 QE3
DupTypeNorm
Proposed method
Time(s)
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
246
6. CONCLUSION
In this paper, a statistical dependent and ranking measure for keyword search over
XML data is designed and this approach is analyzed over various real XML datasets. Also,
we have performed a broad analysis over the different approaches available for keyword
search on XML data in the literature. We developed representations for identifying the users
search intention and to resolve the keyword ambiguity issues as well ranking the desired
search intention. This was done by introducing Node index and Data index, based on whose
information Dscore and Tscore measures were developed to infer the search for node type,
and a Correlation Ranking mechanism to Rank the search intention. From the results obtained
of the Query under testing different datasets in terms of effectiveness and efficiency indicates
that the proposed approach outperforms the existing techniques of XML keyword search.
7. REFERENCES
[1] D. Guillaume and F. Murtaugh, “Clustering of XML Documents”, Computer physics
communication, Vol: 127, pp: 215-227, 2000.
[2] N. Sundaresan, “A classifier for semi-structured documents”, in proceedings of the
sixth ACM SIGKDD international conference on knowledge discovery and data
mining, pp: 3404—344, 2000.
[3] Antoine Doucet and Helena Ahonen-Myka, "Naive clustering of a large XML
document collection", in Proceedings of the 1st INEX, Germany, 2002.
[4] Abiteboul, S., Buneman, P. and Suciu, D, “Data on the Web”, Morgan Kaufmann,
2000.
[5] JianhuaFeng and GuoliangLi , “Efficient Fuzzy Type-Ahead Searchin XML
Data”,IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 24, NO. 5, MAY 2012.
[6] Wei Wang, Christopher Peery, Ame´lie Marian, and Thu D. Nguyen, “Efficient
Multidimensional Fuzzy Search for Personal Information Management Systems”,
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
24, NO. 9, SEPTEMBER 2012.
[7] Ziyang Liu, YichuanCai, and Yi Chen, “TargetSearch: A Ranking Friendly XML
Keyword Search Engine”,International conference on Data Engineering, pp:1101-
1104, 2010.
[8] Chunxiao Liu, XiangfuMeng and Ke Wei, “A Top-k Keywords Searching Approach
based on the Relationship of Keywords”, IEEE International Conference on Systems,
Man, and Cybernetics, October 2012.
[9] Yiqun Chen and Jinyin Cao, "TakeXIR: a Type-Ahead Keyword Search XML
Information Retrieval System", I.J. Education and Management Engineering, vol.8,
pp: 1-5, 2012.
[10] ZhifengBao, Jiaheng Lu, Tok Wang Ling and Bo Chen, "Towards an Effective XML
Keyword Search", Knowledge and Data Engineering, Vol. 22, no. 8, pp: 1077- 1092,
2010.
[11] Jiang Li and Junhu Wang, "Effectively Inferring the Search-for Node Type in XML
Keyword Search", Database Systems for Advanced Applications, p p.110-124, 2010.
[12] Liang Jeff Chen and YannisPapakonstantinou, "Supporting Top-K Keyword Search in
XML Databases", Data Mining Workshops (ICDMW), p p. 805- 812, 2012.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
247
[13] Wilfred Ng and Lau Ho Lam, "A Co-Training Framework for Searching XML
Documents", Journal Information Systems, vol.32, no.3, 2007.
[14] B. Kimelfeld and Y. Sagiv, “Efficiently enumerating results of keyword search”, In
Proceedings of DBPL Conference, pp. 58-73, 2005.
[15] Y. Li, C. Yu, and H. V. Jagadish, “Schema-free XQuery”, In VLDB, pp. 72-83, 2004.
[16] A. Schmidt, M. L. Kersten, and M. Windhouwer, “Querying XML documents made
easy: Nearest concept queries”, In ICDE, pp. 321-329, 2001.
[17] Ralf Schenkel and Martin Theobald, "Structural Feedback for Keyword-Based XML
Retrieval", ECIR, pp. 326-337, 2006.
[18] Bo Chen, Jiaheng Lu, and Tok Wang Ling, "Exploiting ID References for Effective
keyword Search in XML Documents", In Proceedings of DASFAA, pp. 529-537,
2008.
[19] ArashTermehchy, mariannewinslett, “Using Structural Information in XML Keyword
Search Effectively”, ACM Transactions on Database Systems, Vol. 36, No.1, Month
2011.
[20] William Webber, “Evaluating the Effectiveness of Keyword Search", IEEE Data Eng.
Bull., vol. 33, no. 1, pp. 54-59, 2010.
[21] Junfeng Zhou, ZhifengBao, Wei Wang, Tok Wang Ling, Ziyang Chen, Xudong Lin
and JingfengGuo, "Fast SLCA and ELCA Computation for XML Keyword Queries
based on Set Intersection”, Data Engineering (ICDE), p p.905-916, April 2012.
[22] Jia-Jian Jiang, Zhi-Hong Deng, NingGao, and Sheng-Long Lv, "Guess What I Want:
Inferring the Semantics of Keyword Queries Using Evidence T heory", Springer-
Verlag Berlin Heidelberg, p p. 388-398, 2012.
[23] Dayananda P, Dr. Rajashree Shettar,” Survey on Information Retrieval in Semi
Structured Data”, International Journal of Computer Applications 32(8):1-5, October
2011.
[24] Y. Swapna, S. Ravi Sankar, “A Frame Work For Clustering Time Evolving Data
Using Sliding Window Technique” International Journal of Computer Engineering &
Technology (IJCET),Volume 3,Issue 3,2012,pp. 377 - 383,ISSN Print:0976 – 6367,
ISSN Online: 0976 – 6375.

Mais conteúdo relacionado

Mais procurados

IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET Journal
 
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Amir Shokri
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIESA NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIESijaia
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
 
Survey on scalable continual top k keyword search in relational databases
Survey on scalable continual top k keyword search in relational databasesSurvey on scalable continual top k keyword search in relational databases
Survey on scalable continual top k keyword search in relational databaseseSAT Journals
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
 
Survey on scalable continual top k keyword search in
Survey on scalable continual top k keyword search inSurvey on scalable continual top k keyword search in
Survey on scalable continual top k keyword search ineSAT Publishing House
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal
 

Mais procurados (17)

IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword Manager
 
H04564550
H04564550H04564550
H04564550
 
G1803054653
G1803054653G1803054653
G1803054653
 
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIESA NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Survey on scalable continual top k keyword search in relational databases
Survey on scalable continual top k keyword search in relational databasesSurvey on scalable continual top k keyword search in relational databases
Survey on scalable continual top k keyword search in relational databases
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
 
Survey on scalable continual top k keyword search in
Survey on scalable continual top k keyword search inSurvey on scalable continual top k keyword search in
Survey on scalable continual top k keyword search in
 
In3415791583
In3415791583In3415791583
In3415791583
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
 

Destaque

10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniquesdipanjalishipne
 
20080930
2008093020080930
20080930xoanon
 
Building Search Systems for the Enterprise
Building Search Systems for the EnterpriseBuilding Search Systems for the Enterprise
Building Search Systems for the EnterpriseYunyao Li
 
Surfing the Big Data waves - Don't forget your branding
Surfing the Big Data waves - Don't forget your brandingSurfing the Big Data waves - Don't forget your branding
Surfing the Big Data waves - Don't forget your brandingbpost
 

Destaque (9)

50120140505007
5012014050500750120140505007
50120140505007
 
10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniques
 
Using JSTOR
Using JSTORUsing JSTOR
Using JSTOR
 
20080930
2008093020080930
20080930
 
Building Search Systems for the Enterprise
Building Search Systems for the EnterpriseBuilding Search Systems for the Enterprise
Building Search Systems for the Enterprise
 
Mythology of search engine
Mythology of search engineMythology of search engine
Mythology of search engine
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Www04 -rose
Www04 -roseWww04 -rose
Www04 -rose
 
Surfing the Big Data waves - Don't forget your branding
Surfing the Big Data waves - Don't forget your brandingSurfing the Big Data waves - Don't forget your branding
Surfing the Big Data waves - Don't forget your branding
 

Semelhante a A novel approach towards developing a statistical dependent and rank

call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Surveyijceronline
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 
Answering approximate-queries-over-xml-data
Answering approximate-queries-over-xml-dataAnswering approximate-queries-over-xml-data
Answering approximate-queries-over-xml-dataShakas Technologies
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Generic Algorithm based Data Retrieval Technique in Data Mining
Generic Algorithm based Data Retrieval Technique in Data MiningGeneric Algorithm based Data Retrieval Technique in Data Mining
Generic Algorithm based Data Retrieval Technique in Data MiningAM Publications,India
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiescsandit
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
 

Semelhante a A novel approach towards developing a statistical dependent and rank (20)

Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using Xml
 
2
22
2
 
2
22
2
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web Engineering
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Survey
 
Ijert semi 1
Ijert semi 1Ijert semi 1
Ijert semi 1
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
Answering approximate-queries-over-xml-data
Answering approximate-queries-over-xml-dataAnswering approximate-queries-over-xml-data
Answering approximate-queries-over-xml-data
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Generic Algorithm based Data Retrieval Technique in Data Mining
Generic Algorithm based Data Retrieval Technique in Data MiningGeneric Algorithm based Data Retrieval Technique in Data Mining
Generic Algorithm based Data Retrieval Technique in Data Mining
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologies
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 

Mais de IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

Mais de IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Último

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

A novel approach towards developing a statistical dependent and rank

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 229 A NOVEL APPROACH TOWARDS DEVELOPING A STATISTICAL DEPENDENT AND RANKING MEASURE FOR KEYWORD SEARCH OVER XML DATA Dayananda P1 , Dr. Rajashree Shettar 2 1Assistant Professor, Department of Information Science and Engg, MSRIT, Bangalore-54 2 Professor, Department of Computer Science and Engg, RVCE, Bangalore-59 ABSTRACT Extensible Markup Language (XML) defines a set of conventions for representing the encrypted documents in both human-readable and machine-readable format. XML is widely used to represent the arbitrary data structure. Since XML is being largely accepted as a standard for data representation, it is mostly preferred markup language to support keyword search. In this paper, a statistical dependent and ranking measure for keyword search over XML data is proposed. The proposed method consists of the following steps such as: 1) Indexing, 2) Selecting the exact T-type node, 3) Data search and Ranking of search results. A T-type node is considered as a desired node to searched, if XML node contains informative enough with relevant information and node type T should relate to every keyword in query. First the input XML data is given to indexing process that converts the XML data into the indexed format to make search easier. Then, the corresponding T-type node is selected through our proposed statistical dependent formulae. Once selection of T-type node, the relevant data is obtained based on sorting the node type paths. Finally, ranking is done based on the search results obtained from the previous steps with our designed ranking measure. This work of ours addresses the two challenges addressed by TF*IDF strategy and improve the effectiveness of the search for node type and ranking of search results. Keywords: XML Keyword search, Indexing, search for node type, Data search and Ranking Measure. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 229-247 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 230 1. INTRODUCTION For big amounts of information, Internet is the depository space. The sharing of XML information quantity over the World Wide Web is expanding severely. The text-centric XML document collections are now obtaining more and more common, as the big majority of this XML data is data-centric. As an effect, it became useful to give means to control these collections. Using document-clustering methods this can be done by automatically arranging very big collections into smaller sub-collections. Unluckily, the majority of the research on structured document processing [1] and [3] is still focused on data-centric XML. With the major difficulty in this area being the need to optimally index them for storage and retrieval purposes, the Processing and management of XML documents [4] have already become popular research issues. There have been several searching methods grown up in the IR research community that basically depend on a set of weighted keywords in a search query to decide the proximity of the query and a document in the feature space. However, the finding of XML documents goes away from the conventional data retrieval strategy, which means that the XML documents have nested XML elements and semantics of information values indicated by tags. As an effect, in XML searching, the notion of keyword proximity utilized in IR [13] is too simple to be effective. To enquire XML documents the Keyword search is a handy way, since it permits users to easily issue keyword queries without the knowledge of complex query languages or the structure of underlying information. The keyword proximity search is focused on by majority of the research efforts in XML keyword search in either tree model or general digraph model. The two approaches commonly suppose a smaller sub-structure of the XML document which consists of all query keywords indicates a better effect. Smallest Lowest Common Ancestor (SLCA) is a simple and effective semantics in tree model for XML keyword proximity search [15, 8]. Every SLCA result of a keyword query is a smallest XML node that 1) covers all keywords in its descendants and 2) has no single proper descendant to cover all query keywords. Based on tree model, however, the SLCA semantics does not catch ID reference data that is generally available and significant in XML data-bases. It may, as an effect, return a large tree consisting of irrelevant data. XML documents, on the other hand may be modeled as digraphs to take into account ID reference edges. The main concept in digraph model, which finds for minimal connected sub trees in graph, is called reduced sub trees [14]. However, the difficulty of searching all reduced sub trees and enumerating effects by rising sizes of reduced sub trees is NP-hard [17, 10]. The heuristics are dependent on by current XML keyword and natural language query answering approaches that suppose certain properties of the DB schema. Though these heuristics are intuitively logical, even in the highest-quality XML schemas, they are enough ad hoc that they are often violated in practice. Thus present approaches endure from low precision, low recall, or both [19]. Now the concern is turning to queries of the end-user effectiveness of such search systems. To the new domain, the Traditional IR similarity metrics have been ported and combined with domain-specific structural features. Both through developing new methods and tuning existing ones, there is also proof of significant improvements in effectiveness [20]. Motivation of our research is to design and develop a technique for keyword search over XML data. The work presented in [10] over the XML search technique is our real motivation, in which they have used TF*IDF strategy by addressing two challenges. When analyzing the existing work [10], finding the term frequency-based score computation was
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 231 not much impressive in selecting the exact T-type node. Incorporating some other features along with frequency can lead to effective T-type search in XML data. Searching output for a user is significantly high, the ranking of search result is more important. This problem can be solved easily by putting the effective ranking mechanism. The above mentioned two challenges will be solved using the proposed methodology along this; work addresses the effectiveness and efficiency in term of result relevance by addressing the challenges addressed in [10] such as identifying the users search intention, resolving the keyword ambiguity issues and effective ranking of the search results. The proposed method consists of the following steps such as; 1) Indexing: The input XML data is given to indexing process that converts the XML data into the two indices (data index and node index) which will make search easier. 2) Selecting the exact T-type node: The corresponding T-type nodes will be selected through our designed statistical dependent formulae such as Dscore and Tscore . 3) Data search and Ranking of search results: Once selection of T-type nodes, the relevant data are obtained based on the sorting the node type paths. Finally, ranking will be done based on the search results obtained from the previous steps with our designed ranking measure using correlation measure. The rest of the paper is organized as follows. The literature of keyword search over XML data is presented in Section 2, and proposed research methodology in Section 3. In Section 4 the proposed method is discussed, while the Results and Experiments are discussed in Section 5. The conclusion is done in Section 6. 2. RELATED WORK JianhuaFeng and GuoliangLiet al in [5] presented a fuzzy type-ahead search in XML data, their information-access paradigm in which the system searches XML data on the fly as the user types in query keywords. It allows users to explore data as they type, even in the presence of minor errors of their keywords. Their approach had the following features: 1) Search as you type: It extended Auto complete by supporting queries with multiple keywords in XML data. 2) Fuzzy: It could find high-quality answers that have keywords matching query keywords approximately. 3) Efficient: effective index structures and searching algorithms can achieve a very high interactive speed. They presented an effective index structures and top-k algorithms to achieve a high interactive speed. Also, they examined effective ranking functions and early termination techniques to progressively identify the top- k relevant answers. And their implementation results achieved high search efficiency and result quality. Wei Waet al in [6] presented a multidimensional search approach that allows users to perform fuzzy searches for structure and metadata conditions in addition to keyword conditions. Their techniques individually score each dimension and integrate the three dimension scores into a meaningful unified score. They also have designed indexes and algorithms to efficiently identify the most relevant files that match multidimensional queries. Experimental evaluation of their approach showed that their relaxation and scoring
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 232 framework for fuzzy query conditions in non content dimensions can significantly improve ranking accuracy. Ziyang Liu et al in [7] presented an XML search engine Target Search that addresses an open problem in XML keyword search: given relevant matches to keywords, how to compose query results properly so that they could be effectively ranked and easily digested by users. Intuitively, each query had a search target and each result should contain exactly one instance of the search target along with its evidence. They have developed Target Search which composes atomic and intact query results driven by users search targets. ChunxiaoLiuetalin [8] presented a user-friendly Top-k keywords searching approach based on the relationship of keywords. The SLCA of a keyword search was first obtained by the LISA II algorithm. Then, the structure of SLCA was leveraged to speculate the relationship of keywords, i.e., the keyword search was translated into twig queries. Next, the relationship of keywords could be estimated by the structure of twig queries and these twig queries were ranked according to the relationships of keywords. Finally, all results of the ordered twig queries were obtained by TJFast algorithm. Yiqun Chen and Jinyin Cao in [9] have presented an approach to type-ahead keyword searched in XML data, call Take XIR. The IR-style approach basically utilized the statistics of underlying XML data to address that the following challenges in XML IR system: (1) identify the user search intention, i.e. identify the keywords to express user interests and identify nodes user wanted to search for and search via. (2) Resolve keyword ambiguity problems: synonyms and polysemy exist in natural language, and a keyword could appear as the text values or tag value of different XML node and carry different meanings. They have modeled XML data as a graph, analyzed the identification of user search intention and result ranking in the presence of keyword ambiguities and used the related definition and formula to build a query prediction technique to improved search efficiency. Jiang Li and Junhu Wang [11] have presented an XML keyword search provided a simple and user-friendly way of retrieved data from XML databases, but the ambiguities of keywords make it difficult to effectively answer keyword queries. XReal utilized the statistics of underlying data to resolved keyword ambiguity problems. However, they found their presented formula for inferring the search-for node type suffers from inconsistency and abnormality problems. Finally a dynamic reduction factor schemes as well as an algorithm Dynamic Infer to resolve these two problems. Experimental results are shown provided to verify the effectiveness. Liang Jeff Chen and YannisPapakonstantinouin[12] have presented a series of algorithm that incorporated both the efficient semantic pruning and the top-K processing to support top-K keyword search[23]. They presented a join-based algorithm that processes nodes bottom up and reduced keyword query evaluated into relational joins. Several optimizations were presented to further improve its efficiency. They then incorporated the idea of the top-K join from relational databases and presented a join-based top-K algorithm to computed top K results. Extensive experimental results confirmed the advantages of algorithms over previous algorithms in both efficiency and top-K processing. ZhifengBaoetalin [10] have studied the problem of effective XML keyword search which included the identification of user search intention and result ranking in the presence of keyword ambiguities. They utilized statistics to infer user search intention and rank the query results. In particular, they have defined XML TF and XML DF, based on which have been designed formulae to computed the confidence level of each candidate node type to be a search for/search via node, and further proposed XML TF*IDF similarity ranking scheme to
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 233 captured the hierarchical structure of XML data. Finally, the popularity of a query result (captured by ID Ref relationships) was considered to handle the case that multiple results have comparable relevance scores. As an extension of [10], several major updates in terms of: 1)our ranking framework uses the correlation concept considered in section 4, which outperforms the ranking concepts in[10], 2) Selecting the exact T-type node into consideration in section 4, 3) New index and algorithm are designed in section 4. 3. RESEARCH METHODOLOGY Definition 3.1(Structural Node) A tag name is used to label XML node called a structural node. Internal node is defined as children’s of structural node; otherwise, it is called a leaf node. Definition 3.2(T type node) A T type node is considered as a desired search for node if, T type node is intuitively related to every query keyword, XML nodes of T type should be informative enough to contain enough relevant information and XML nodes of type T should be not overwhelming to contain too much irrelevant information . Definition 3.2 (Data Node) the leaf node of XML data containing text values and have no tag name is called as data node. The primary intention of our research is to design and develop a technique for keyword search over XML data. The real motivation of the work is come out from the XML search technique given in [10], in which they have used TF*IDF strategy by addressing two challenges. When analyzing the existing work [10], the finding is that term frequency-based score computation was not much impressive in selecting the exact T-type node. Incorporating some other features along with frequency can lead to effective T-type search in XML data. Also, the ranking of the search results is important for the users if search output is significantly high. This problem can be solved easily by putting the effective ranking mechanism. The above mentioned two challenges will be solved using the proposed methodology. The proposed method consists of the three major steps such as, 1) Indexing, 2) Selecting the exact T-type node, 3) Data search and Ranking of search results. At first, the input XML data is given to indexing process that converts the XML data into the indexed format to make search easier. Then, the corresponding T-type nodes are selected through our designed statistical dependent formulae. Once we select T-type nodes, the relevant data are obtained based on the similarity matching with the input query. Finally, ranking will be done based on the search results obtained from the previous steps with our designed ranking measure. The proposed algorithm will be implemented using JAVA and the performance of the algorithm will be compared with existing algorithm in terms of precision, recall and ranking measure with two different datasets.
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 234 4. PROPOSED METHOD 1. Indexing The approach presented in [10] for Data processing, built two indices viz. keyword inverted list and frequency table. Of these indices, the keyword inverted list retrieves a list of data nodes in document order whose values contain the input keyword. For each inverted list, an index viz. B+-Tree is built on top of it. The second index built, called frequency table, stores only the frequency (number of T-typed nodes that contain keyword k in their subtrees in XML data) for each combination of keyword k and node type T in XML document. If a query keyword is searched, the approach presented in [10] doesn’t identify the keyword as node or data and this leads to more complex query processing. There by, to overcome the above discussed demerits, a specific indexing method is proposed that builds two indices viz. Nodeindex and Data index for structural nodes and data nodes respectively. These two indices are represented in Table 1 and Table 2 for DBLP XML document. In contrast to the indices presented in[10], the proposed approach stores node name of each structural node, frequency of occurrence of each structural node either in T- typed nodes or their subtrees, prefix path of the corresponding T-typed nodes in the node index and name of data nodes. Corresponding node names and frequency of occurrences of each data node in XML document is stored in data index. The data node information table is dependent on the Node index in relation with the node name. Scores with reference to the two indices is utilized efficiently to determine the exact T-typed node for a given keyword query. Thus, the proposed indexing approach addresses each node and data separately in XML database and results in effective query processing. The fig 1 shows the partial structure of DBLP XML database and Fig 2 shows partial data subtree for DBLP XML database. Fig.1. Partial data tree structure for ‘DBLP’ XML database pages 416-440 book title year 1986 dblp inproceedings phdthesisarticlemastersthesis author title year school Tolga Yurek “Efficient view maintenance at data warehouses ” 1997 “University of California at santa Barbara, department of computer science” ee author cdrom “GTE/ MAN0 95 pdf” “Frank Manol a” “db/labs/ gte/TR- 0310-11- 95- 165.html ” author title school year “AndraSi keler” “impleme ntierungs konzeptef uuml; r Non- standard- Datenban ksysteme. ” 1989 “Universitauml; t kaiserslavtern” author title url “Eike Best” “COSY: Its Relation to Nets and CSP.” “db/c onf/a c/petr i86- 2.htm l#Bes t86” month “November”
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 235 Sr. no. Node Frequency Path 300 author 212898 dblp,article 302 url 106805 dblp,article 303 publisher 4 dblp,article 307 year 72 dblp,phdthesis 311 publisher 3 dblp,phdthesis 319 author 14 dblp,www 320 editor 21 dblp,www 321 booktitle 1 dblp,www 324 title 2609 dblp,proceedings 326 series 1955 dblp,proceedings Table 1: Node index Table 2: Data index 3. SEARCH FOR NODE TYPE-T For selection of exact T- type node for a given keyword query, the keyword matching tag may occur many times in different T-typenode and their subtrees. Thus, causing search for node type process to be more complex. In order to overcome this drawback, we have proposed a couple of mathematical scores such that the optimal T-type nodes are selected. The proposed mathematical scores which addresses the complexity issue are viz; 1) Dscore and 2) Tscore. Where, Dscoreis the ratio of the depth of the ancestor nodes from the keywords in a given query and Tscore gives the percentage score of each node type having the best depth score (Dscore). a) Dscore For a given input Qurery ‘q’, initially the depth of the Lowest common ancestor(LCA) node from all the keywords in the query, as well the depth of the Highest common ancestor(HCA) node for the same keywords are computed. Therefore, the ratio of the depth of the ancestor nodes from the keywords in a given query is known as the Dscore. Sr. no. Data Node Frequency 30 db/labs/gte/index.html#TR-0169-12-91-165 url 1 32 db/labs/gte/TR-0231-08-93-165.html ee 1 33 Sandra Heiler author 7 35 TR-0231-08-93-165 volume 8 36 1993 year 4144 38 GTE/MANO93c.pdf cdrom 1 42 June month 5 44 db/labs/gte/index.html#TM-0014-06-88-165 url 1 45 GTE/MANO88.pdf cdrom 1 46 db/labs/gte/TM-0332-11-90-165.html ee 1
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 236 Month Fig 2. Partial data sub tree Structure for ‘DBLP’ XML database       nodeHCAofdepth nodeLCAofdepth =D score (1) The LCA nodes with the lowest set of Dscore values are selected as the probable node type for the given Query ‘q’. From these set of likely Dscore values the best node will be selected as the T-type node for given Query keywords. To do so, a Tscore percentage is estimated. b) Tscore Tscore percentage is estimated by defining the score as for a keyword query, what is the chance of occurrence of keyword ‘k’ at that node type-T. This can be identified by conditional probability property. The conditional probability states that, if ‘q’ and ‘T’ are the events respectively, then it is said to be the probability of ‘q’ given ‘T’ and it is denoted by P (q/T). Therefore, the conditional probability with respect to the above definition and notations is expressed as; ( ) ( )TP TqP = T q P I       (2) Where; P(q/T) is defined as the chance of event ‘q’ when event ‘T’ have occurred, P(q n T) is the occurrence of event ‘q’ in event ‘T’, P(T) is defined as the probability of occurrence of event ‘T’. dblp Article “November” ee Author cdrom “GTE/MAN095 pdf” “Frank Manola” “db/labs/gte/TR- 0310-11-95- 165.html”
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 237 Now with reference to the mathematical derivation of the conditional probability (P(q/T)), say probability of ‘q’ given ‘T’. Equation in (2) can be represented the sum of the probability of occurrence of the keyword at that node type-T. ( ) ∑∈             Tqk P(T) P(k) = T q P I (3)    ×            ∑∈ P(k) P(T) 1 = T q P T)(qk I P (T) is constant for no of keywords (‘k’=1 to n) in the query (4) )( 1 P(k)= T q P n 1k TP =    ×      ∑= αα (5) Thus, to estimate the best T-node type the percentage of frequency of occurrence of ‘k’ at that node type is very important and hence it is considered as the Tscore% of a particular node and the node having highest Tscore% is the relevant type node and is defined as- Therefore, ∑= × n k 1 score P(k)=T α (6) But, P (k) can also be defined as the frequency of occurrence of ‘k’ at that node type ‘T’ and P (T) can also be defined as the frequency of the node type-T. And hence defined in equation (6) as; )( 1 ,f(k)=T 1 score Tf for n k =    × ∑= αα (7) Thus the Tscorepercentage is defined as, 100f(k)=T 1 score% ×× ∑= n k α (8) The percentage score of the optimal node type Tscore% is thus defined as, the percentage of frequency of occurrence of keywords in the query at a particular node type with respect to the frequency of occurrence of that node type defined in equation(8). 4. DATA SEARCH AND RANKING For a input keyword query containing ‘n’ keywords. Based on proposed indexing techniques after pre-processing the XML document, we extract two different indices for each keyword in the Query. These indices are viz; data index and node index. Data index is the one having its frequency and node type information whereas; Node index is the one having
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 238 its frequency and path information. The proposed XML keyword search is carried out in following steps: 1. It identifies the search intent of the user. To identify the desired search for node type we initially estimate the Dscore of the LCA nodes in the XML document using equation (1) and choose those nodes having leastDscore. 2. Then for each node type having a valid Dscore, we evaluate its Tscore% by using equation (8) and choose the optimal or maximum Tscore% as the best search for node type. 3. With respect to the desired or relevant search for node type-T computed form valid Tscore% the prefix paths for the node type are sorted. Then the sorted prefix paths of the search for node type is Ranked by defining the correlation between the sorted paths. Algorithm 1: Input: Query; Node_index; Data_index; Keyword Matching= index( ) { Query="q"; if (q = node & Node index!=null) for(Node_indexlength) { q = keyword[Node_index]; f= get_nodefrequency(query); } Else if(q = data &Data_index!=null) for(Data_indexlength) { q = keyword[Data_index]; f= get_datafrequency(q); } } // search for node type// Score = get_Dscore( ) { if (Dscore( ) = min) then get_Tscore() node_type = max[Tscore( )] } //Ranking// Rank = get_corr( ) { if (sum_corr( ) = max) then Ry = max[sum_corr( )] Check threshold() { if difference (Rank1-Rank2)<Threshold then select lowest Tscore else Rank1. } } In algorithm 1, function get_nodefrequency will calculate the frequency of T type nodes containing all the query keywords and function get_datafrequency will retrieves the number of data node present under an each T-type node. Dscore retrieves the list of path with lowest Dscore value and it is based on output of Dcore function, the path is selected with highest Tscore. Finally ranking is done through get_corr function, by finding correlation between all paths.
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 239 Generally, any statistical relationship between two random variables or two sets of data is referred to as Dependence. And any of a broad class of statistical relationships involving dependence is referred to as Correlation. There are several correlation coefficients measuring the degree of correlation. The most commonly preferred is Pearson’s correlation coefficient. Pearson’s correlation is obtained by dividing the covariance of the two variables by the product of their standard deviations. Since we have series of n sorted paths of say X & Y written as Xi& Yi where i=1, 2… n. thus the sample correlation coefficient is used to estimate the population pearson correlation ‘r’ between X & Y. The sample correlation coefficient for Ranking is written as; ∑ ∑ ∑ = = =       × n 1i 1i 1i 2 i 2 i ii xy )y'-(y)x'-(x )]y'-)(yx'-[(x =r n n (9) ∑=                 ×      × n 1i i x i xy )y'-(y S )x'-(x 1)-(n 1 =r yS (10) x i S )x'-(x Is the standard score, the equation above can be corrected for a sample X’ is the sample mean and sx is the sample standard deviation given in equation 9 & 10.After determining the correlation for each combination of paths for the search for node type, the sum of the correlation of a path with itself and the other paths related to the node type will rank the node type path. Correlation map X Y P1 P2 P3 P4 P5 P1 Corr(P1,P1) Corr(P1,P2) Corr(P1,P3) Corr(P1,P4) Corr(P1,P5) P2 Corr(P2,P1) Corr(P2,P2) Corr(P2,P3) Corr(P2,P4) Corr(P2,P5) P3 Corr(P3,P1) Corr(P3,P2) Corr(P3,P3) Corr(P3,P4) Corr(P3,P5) P4 Corr(P4,P1) Corr(P4,P2) Corr(P4,P3) Corr(P4,P4) Corr(P4,P5) P5 Corr(P5,P1) Corr(P5,P2) Corr(P5,P3) Corr(P5,P4) Corr(P5,P5) Rank Σx=1to5corr(Px,P1) Σx=1to5corr(Px,P2) Σx=1to5corr(Px,P3) Σx=1to5corr(Px,P4) Σx=1to5corr(Px,P5) Therefore from the correlation map it is observed that the correlation each pair of path addresses the ranking effectiveness. The ranking is defined as; ∑= 5 1x yxy )P,corr(P=R (11) The Path of the search for node type having the ‘Ry’ value with the highest sum is ranked as the best search intention given in equation 11, if the difference of the first to ranked correlation sum of the paths is greater than or equal to the threshold value, else if the difference is less than the threshold then the lowest Tscore% is selected as the desired search for node type, as given in equation 12.
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 240 Rank1.(Rank)maxRelsif Tscore%lowestistypenodeRthen thresholdRank2)-iff(Rank1=R == = <d (12) 5. RESULTS AND COMPARISON Our proposed statistical dependent and ranking measure for keyword search over XML data was experimented by implementing our approach using JAVA software (jdk-1.6 version) on 3.20GHz Intel(R) Pentium(R) D, 1.00GB RAM, and 32-bit operating system with windows 7 professional. The experimental results obtained are tabulated and these results are compared with the existing method XReal. The results generated and compared are tested for the real datasets; viz., DBLP, WSU, and eBay [10, 2], and are further discussed in terms of effectiveness and efficiency. Effectiveness test: This type contains two tests viz., 1.1) Inferring the desired search for node type and 1.2) Quality measure using metrics= Precision, Recall and F-measure. Efficiency test: This type of test is evaluated by measure of Query response time of the proposed method with the XReal for all three real datasets. Note: Query under test Notation Query DBLP dataset QD1 “Java book” QD2 “author Chen Lei” QD3 “Jim Gray article” QD4 “XML twig” QD5 “Ling tokwang twig” QD6 “vldb 2000” QD7 “Philip Bernstein” QD8 “WISE” QD9 “ER 2005” QD10 “LATIN 2006” WSU dataset QW1 “230” QW2 “CAC 101” QW3 “ECON” QW4 “Biology” QW5 “place TODD” QW6 “days TU TH” eBay dataset QE1 “2 days” QE2 “cpu 933” QE3 “Hard drive CA” 5.1 Effectiveness test The effectiveness of our approach for a statistical dependent and ranking measure for keyword search over XML data is addressed by identifying the user search intention and resolving the ambiguity issues. The accuracy of our approach is tested by evaluating the user search intention for the search for node type for the query tabulated in the table 3 of which couple of query having both ambiguity 1 and 2 and few having ambiguity 2 are considered.
  • 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 241 5.1.1 Inferring the desired search for node type The queries used in table 3, such as QD1 and QD3 have both ambiguity 1(keyword appearas an XML tag name and text value) and ambiguity 2(keyword appear as text values of different type of XML nodes) whereas QD2, QD6 and QW1 have ambiguity 2. The user search intention, if observed from the table 3 for DBLP dataset is ideal for our method and XReal approach compared to the SLCA/XSeek. While for the WSU and eBay dataset the search intention is almost able to infer a desired search for node type as these datasets are of small size and the root node occurs alongside the search intention. For example in case of Query QE1 search intention is auction_info and our approach outputs auction _info; listing. Example for desired Search for node type using our proposed method is as follows; We consider a Query for which the complete Search for node type is presented. Input Query: “java book” ========================================== 1) Dscore Tag frequency path Dscore author 413010 dblp,inproceedings 1.0 author 212898 dblp,article 1.0 title 179060 dblp,inproceedings 1.0 url 179058 dblp,inproceedings 1.0 booktitle 179058 dblp,inproceedings 1.0 title 106834 dblp,article 1.0 url 106805 dblp,article 1.0 ee 73560 dblp,inproceedings 1.0 ee 23442 dblp,article 1.0 title 2609 dblp,proceedings 1.0 url 2491 dblp,proceedings 1.0 booktitle 2293 dblp,proceedings 1.0 author 1996 dblp,incollection 1.0 author 1153 dblp,book 1.0 title 1009 dblp,incollection 1.0 booktitle 1009 dblp,incollection 1.0 url 1006 dblp,incollection 1.0 title 845 dblp,book 1.0 book 845 dblp,book 1.0 url 128 dblp,book 1.0 ee 107 dblp,incollection 1.0 title 72 dblp,phdthesis 1.0 author 72 dblp,phdthesis 1.0 url 38 dblp,www 1.0 title 38 dblp,www 1.0 author 14 dblp,www 1.0 ee 6 dblp,proceedings 1.0 title 5 dblp,mastersthesis 1.0 ee 5 dblp,book 1.0 author 5 dblp,mastersthesis 1.0 ee 1 dblp,phdthesis 1.0 booktitle 1 dblp,www 1.0
  • 14. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 242 2) Tscore Tag Name Tscore path booktitle 182361.0 dblp,www author 125829.6 dblp,mastersthesis ee 97121.0 dblp,phdthesis title 58094.4 dblp,mastersthesis author 44939.142857142855 dblp,www ee 19424.2 dblp,book ee 16186.833333333332 dblp,proceedings author 8738.166666666666 dblp,phdthesis title 7644.0 dblp,www url 7619.105263157894 dblp,www title 4034.333333333333 dblp,phdthesis url 2261.921875 dblp,book ee 907.6728971962616 dblp,incollection author 545.661751951431 dblp,book title 343.75384615384615 dblp,book author 315.2044088176352 dblp,incollection title 287.8810703666997 dblp,incollection url 287.79920477137176 dblp,incollection booktitle 180.73439048562932 dblp,incollection url 116.228823765556 dblp,proceedings title 111.33461096205444 dblp,proceedings booktitle 79.52943741822939 dblp,proceedings ee 4.143033870830134 dblp,article author 2.9551616266944736 dblp,article title 2.7189097103918227 dblp,article url 2.710790693319601 dblp,article title 1.6222048475371385 dblp,inproceedings url 1.6169397625350446 dblp,inproceedings author 1.5233238904627007 dblp,inproceedings ee 1.3202963567156063 dblp,inproceedings booktitle 1.0184465368763194 dblp,inproceedings book 0.0 dblp,book ================================================= 3) Ranking Example: correlation of dblp,proceedings and dblp,incollection corr(dblp,proceedingsdblp,incollection)= 0.1221784083384564 Ranked Sum of correlation: Path Rank P1=dblp,book 3.2727014742218543 P2=dblp,phdthesis 3.1869696826431175 P3=dblp,incollection 3.0431260287060002 P4=dblp,www 2.0916351992181195 P5=dblp,article 1.8924147256281627 P6=dblp,inproceedings 1.8924147256281627 P7=dblp,proceedings 0.13822919060961375 P8=dblp,mastersthesis 0.0 Rank1.(Rank)maxRelsif Tscore%lowestistypenodeRthen thresholdRank2)-iff(Rank1=R == = <d Selected Path is dblp, book
  • 15. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 243 Table 3. Effectiveness test on Inferring the desired search for node type Query Intention XReal SLCA/XSeek Our DBLP (370MB) QD1 Java, book book book book ; title/ book; article book QD2 author, Chen, Lei inproceedings inproceedings author inproceedings QD3 Jim, Gray, article article article article article QD4 XML, twig inproceedings inproceedings title/ inproceedings inproceedings QD5 Ling, tok, wang, twig inproceedings inproceedings Inproceedings inproceedings QD6 vldb, 2000 inproceedings inproceedings inproceedings inproceedings WSU (16.5MB) QW1 230 place course;place room; crs / course Place;course QW2 CAC, 101 course course course Course QW3 ECON course course prefix/course Course QW4 Biology course course title/course course QW5 place, TODD course course place/course Place;course QW6 days, TU, TH course course days/course Place eBay (0.36MB) QE1 2 , days auction_info listing time_left / listing auction_info;listing QE2 cpu, 933 listing listing cpu / listing Item_info;listing QE3 Hard, drive, CA listing listing description / listing` listing 5.1.2 Quality measure (Precision, Recall & F-measure) Quality measure is also addresses the effectiveness of our approach by evaluating all the queries under test, and sums up few metrics viz; precision, recall and F-measure. Precision is the percentage measure of, the output subtrees that are desired; recall is the percentage measure of the desired subtrees that are output; while F-measure is the weighted mean value of precision and recall. Because most of the queries on DBLP have more than 100 results, therefore, in [10] precision, recall and F- measure are XReal’s. Similarly, for each query issued on WSU and eBay, thus in figure 3 and 4.
  • 16. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 244 (a) (b) (c) Fig. 3. Precision comparison (percent) (a) DBLP (b) WSU and (c) EBAY (a) (b) (c) Fig. 4. Recall comparison (percent). (a) DBLP, (b) WSU, and (c) EBAY 0 10 20 30 40 50 60 70 80 90 100 X Real Proposed 0 10 20 30 40 50 60 70 80 90 100 QW1 QW2 QW3 QW4 QW5 QW6 X Real Proposed 0 10 20 30 40 50 60 70 80 90 100 QE1 QE2 QE3 X Real Proposed 80 82 84 86 88 90 92 94 96 98 100 X Real 0 10 20 30 40 50 60 70 80 90 100 QW1 QW2 QW3 QW4 QW5 QW6 X Real Proposed 0 10 20 30 40 50 60 70 80 90 100 QE1 QE2 QE3 X Real Proposed
  • 17. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 245 Table 4: F-Measure (%) Method Dataset XReal Proposed DBLP 47.48 48.48 WSU 49.67 37.5 EBAY 40.02 44.44 Figure 3 represents that the Average precision for our proposed approach is effective than the XReal for the queries in the DBLP dataset. Figure 4 represents the Recall measure for all three real datasets and the recall measure for our approach out performs XReal. Further, F- measure is measured adopting formula F = [(precision * recall)/ (precision + recall)] to get F- measure in Table 4. This can be measured as the average precision and recall score of all the queries under test. F-measure for our method in the DBLP dataset is 48.48% and Ebay is 44.44% whereas; for XReal in DBLP it is 47.48 % and in Ebay it is 40.02%. 5.2 Efficiency test The efficiency test is addressed by evaluating the query response time adopting our proposed method designing the indices for keyword information discussed in section 4. This is executed by measuring the time taken to search for the node type of the given query. The response time of individual queries under test is represented in Table 4. Proposed method is compared with the XReal Dup type norm. In case of DBLP,WSU and ebay real dataset it is observed that our approach is faster than even Dup type norm (three level information indexing). Fig. 5 shows the response time in seconds on individual queries DBLP, WSU and eBay databases. (a) (b) (c) Fig. 5. Response time on individual queries (a) DBLP (b) WSU and (c) eBay 0 2 4 6 8 10 12 QD1 QD2 QD3 QD4 QD5 QD6 DupTypeNorm Proposed method Time(s) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 QW1 QW2 QW3 QW4 QW5 QW6 DupTypeNorm Proposed method Time(s) 0 1 2 3 4 5 6 QE1 QE2 QE3 DupTypeNorm Proposed method Time(s)
  • 18. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 246 6. CONCLUSION In this paper, a statistical dependent and ranking measure for keyword search over XML data is designed and this approach is analyzed over various real XML datasets. Also, we have performed a broad analysis over the different approaches available for keyword search on XML data in the literature. We developed representations for identifying the users search intention and to resolve the keyword ambiguity issues as well ranking the desired search intention. This was done by introducing Node index and Data index, based on whose information Dscore and Tscore measures were developed to infer the search for node type, and a Correlation Ranking mechanism to Rank the search intention. From the results obtained of the Query under testing different datasets in terms of effectiveness and efficiency indicates that the proposed approach outperforms the existing techniques of XML keyword search. 7. REFERENCES [1] D. Guillaume and F. Murtaugh, “Clustering of XML Documents”, Computer physics communication, Vol: 127, pp: 215-227, 2000. [2] N. Sundaresan, “A classifier for semi-structured documents”, in proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp: 3404—344, 2000. [3] Antoine Doucet and Helena Ahonen-Myka, "Naive clustering of a large XML document collection", in Proceedings of the 1st INEX, Germany, 2002. [4] Abiteboul, S., Buneman, P. and Suciu, D, “Data on the Web”, Morgan Kaufmann, 2000. [5] JianhuaFeng and GuoliangLi , “Efficient Fuzzy Type-Ahead Searchin XML Data”,IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 5, MAY 2012. [6] Wei Wang, Christopher Peery, Ame´lie Marian, and Thu D. Nguyen, “Efficient Multidimensional Fuzzy Search for Personal Information Management Systems”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER 2012. [7] Ziyang Liu, YichuanCai, and Yi Chen, “TargetSearch: A Ranking Friendly XML Keyword Search Engine”,International conference on Data Engineering, pp:1101- 1104, 2010. [8] Chunxiao Liu, XiangfuMeng and Ke Wei, “A Top-k Keywords Searching Approach based on the Relationship of Keywords”, IEEE International Conference on Systems, Man, and Cybernetics, October 2012. [9] Yiqun Chen and Jinyin Cao, "TakeXIR: a Type-Ahead Keyword Search XML Information Retrieval System", I.J. Education and Management Engineering, vol.8, pp: 1-5, 2012. [10] ZhifengBao, Jiaheng Lu, Tok Wang Ling and Bo Chen, "Towards an Effective XML Keyword Search", Knowledge and Data Engineering, Vol. 22, no. 8, pp: 1077- 1092, 2010. [11] Jiang Li and Junhu Wang, "Effectively Inferring the Search-for Node Type in XML Keyword Search", Database Systems for Advanced Applications, p p.110-124, 2010. [12] Liang Jeff Chen and YannisPapakonstantinou, "Supporting Top-K Keyword Search in XML Databases", Data Mining Workshops (ICDMW), p p. 805- 812, 2012.
  • 19. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 247 [13] Wilfred Ng and Lau Ho Lam, "A Co-Training Framework for Searching XML Documents", Journal Information Systems, vol.32, no.3, 2007. [14] B. Kimelfeld and Y. Sagiv, “Efficiently enumerating results of keyword search”, In Proceedings of DBPL Conference, pp. 58-73, 2005. [15] Y. Li, C. Yu, and H. V. Jagadish, “Schema-free XQuery”, In VLDB, pp. 72-83, 2004. [16] A. Schmidt, M. L. Kersten, and M. Windhouwer, “Querying XML documents made easy: Nearest concept queries”, In ICDE, pp. 321-329, 2001. [17] Ralf Schenkel and Martin Theobald, "Structural Feedback for Keyword-Based XML Retrieval", ECIR, pp. 326-337, 2006. [18] Bo Chen, Jiaheng Lu, and Tok Wang Ling, "Exploiting ID References for Effective keyword Search in XML Documents", In Proceedings of DASFAA, pp. 529-537, 2008. [19] ArashTermehchy, mariannewinslett, “Using Structural Information in XML Keyword Search Effectively”, ACM Transactions on Database Systems, Vol. 36, No.1, Month 2011. [20] William Webber, “Evaluating the Effectiveness of Keyword Search", IEEE Data Eng. Bull., vol. 33, no. 1, pp. 54-59, 2010. [21] Junfeng Zhou, ZhifengBao, Wei Wang, Tok Wang Ling, Ziyang Chen, Xudong Lin and JingfengGuo, "Fast SLCA and ELCA Computation for XML Keyword Queries based on Set Intersection”, Data Engineering (ICDE), p p.905-916, April 2012. [22] Jia-Jian Jiang, Zhi-Hong Deng, NingGao, and Sheng-Long Lv, "Guess What I Want: Inferring the Semantics of Keyword Queries Using Evidence T heory", Springer- Verlag Berlin Heidelberg, p p. 388-398, 2012. [23] Dayananda P, Dr. Rajashree Shettar,” Survey on Information Retrieval in Semi Structured Data”, International Journal of Computer Applications 32(8):1-5, October 2011. [24] Y. Swapna, S. Ravi Sankar, “A Frame Work For Clustering Time Evolving Data Using Sliding Window Technique” International Journal of Computer Engineering & Technology (IJCET),Volume 3,Issue 3,2012,pp. 377 - 383,ISSN Print:0976 – 6367, ISSN Online: 0976 – 6375.