SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
                      Uncertainty Based Text Summarization
                           J.Ramachandra1                    J.Ravi kumar2
                              Y.V.Sricharan3                 Y.Venkata Sreekar4
                            1
                                2nd Year M.Tech, CREC, Tirupati
                                 2
                                   Professor of Department of CSE, CREC, Tirupati


Abstract
         Effective extraction of query relevant           Moreover, information pertaining to a query might be
information present within the webpages is a              distributed across several sources. So, it is a difficult
nontrivial task. QTS, task by filtering and               thing for a user to sift through all these documents
aggregating important query relevant sentences            and find the information that it needs. Neither web
distributed across the webpages. This system              directories nor search engines are helpful in this
captures the contextual relationships among               matter. Hence it would be very useful to have a
sentences of these webpages and represents them           system which could filter and aggregate information
as an “integrated graph”. These relationships are         relevant to user‟s query from various sources and
exploited and several subgraphs of integrated             present it as a digest or a summary. This summary
graph which consist of sentences that are highly          would help in getting an overall understanding of the
relevant to the query and that are highly related to      query.
each other are constructed. These subgraphs are                     Automatic Query Specific Multi-document
ranked by the scoring model. The subgraph with            Summarization is the process of filtering important
highest rank which is rich in query relevant              relevant information from the input set of documents
information is returned as a query specific               and presenting the concise version of that documents
summary.      QTS is domain independent and               to the user. QTS fulfills this objective by generating a
doesn’t use any linguistic processing, making it a        summary that is specific to the given query on a set of
flexible and general approach that can be applied         documents. Here we focus on building a coherent and
to unrestricted set of articles found on WWW.             highly responsive multi-document summary which is
Experimental results prove the strength of                complete. This process poses significant challenges
QTS.Very little work is reported on query specific        like maintaining intelligibility, coherence and non-
multiple document summarization.The quality of            redundancy.
summaries generated by QTS is better than                           Intelligibility/responsiveness it is the
MEAD(one of the popular summarizers).                     property that determines if the summary satisfies
                                                          user‟s needs or not while Coherence determines the
Introduction                                              readability and information flow. As user‟s query in
         with the increased usage of computers huge       the context of a search engine is small (typically 2 or
amounts of data is being added to web constantly.         3 words), we need to appropriately select important
This data is stored either on personal computers or       relevant sentences. Some sentences may not contain
data servers accessible through Internet. The             query terms but can still be relevant. Not selecting
increased disk space and popularity of Internet has       these sentences will affect the intelligibility of
resulted in vast repositories of data available at        summary. Moreover, selecting sentences while
fingertips of a computer user resulting in                summarizing long documents without considering the
“Information Overload” problem. The World Wide            context in which it appears will result in an
Web has become the largest source of information          incoherent text. Specific to multi-document
with heterogeneous collections of data. A user in         summarization, there are problems of redundancy in
need of some information is lost in the overwhelming      input text and the ordering of selected sentences in
amounts of data.                                          the summary.
         Search engines try to organize web                         Summarization process proceeds in three
dynamically by identifying, retrieving and presenting     stages - source representation, sentence filtering and
web pages relevant to users search query. Clustering      ordering. In QTS system, the focus is on generating
search engines like cluster go a step further to          summaries by a novel way of representing the input
organize the retrieved search results list into           documents, identifying important sentences which are
categories and allow user to do a focused search.         not redundant and presenting them as summary. The
         A web user seeking information on a broad        intention is to represent the contextual dependencies
topic will either visit a search engine and type in a     present between sentences as a graph by connecting a
query or browse through the web directory. In both        sentence to another sentence if they are contextually
these cases, the number of web pages that are             similar and highlight how these relationships can be
retrieved to satisfy user‟s need is very high.            exploited to score and select sentences.


                                                                                                2272 | P a g e
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
          IG is built to reflect inter and intra document   expansion or indexing and other natural language
contextual similarities present between sentences of        processing problems.
the input documents. It is highly probable that the         In this project we used Porter Stemmer Algorithm.
neighborhood of a node in this graph contains               The Porter stemming [7] algorithm (or „Porter
sentences that have similar content. So, from each          stemmer’) is a process for removing the commoner
node in this graph by exploiting its neighborhood, a        morphological endings from words in English. Its
summary graph called SGraph is built which will             main use is as part of a term normalization process
contain query relevant information.                         that is usually done when setting up Information
          This approach is domain independent and           Retrieval systems.
doesn‟t use any linguistic processing, making it a
flexible and general approach that can be applied to        THE PORTER STEMMER ALGORITHM
unrestricted set of articles. We have implemented                     A consonant in a word is a letter other than
QTS and it works in two phases. First phase deals           A, E, I, O or U and other than Y preceded by a
with creation of IG and necessary indexes that are          consonant. (The fact that the term `consonant' is
query independent and can be done offline.                  defined to some extent in terms of itself does not
          Second phase deals with generation of             make it ambiguous.) So in TOY the consonants are T
summary relevant to the user‟s query from the IG.           and Y and in SYZYGY they are S, Z and G. If a letter
First phase is a time consuming process especially for      is not a consonant it is a vowel.
sets of documents with large number of sentences.                     A consonant will be denoted by c and a
So, this system can be used on a desktop machine in a       vowel by v. A group of consonants of length greater
digital library or in an intranet kind of an                than zero will be denoted by C, and a group of vowels
environment where documents can be clustered and            of length greater than zero will be denoted by V. Any
graphs can be created .We present experimental              word, or part of a word, therefore has one of the four
results that prove the strength of QTS.                     forms: CVCV ... C, CVCV ... V, VCVC ... C, VCVC
Related work                                                ... V. These may all be represented by the single form
          Several clustering based approaches were          [C] VCVC ... [V]. Here the square brackets denote
tried where similar sentences are clustered and a           arbitrary presence of their contents. Using (VC) {m}
representative sentence of each cluster is chosen as a      to denote VC which is repeated m times, this may
summary. MEAD [2] is a centroid based multi-                again be written as [C](VC)
document generic summarizer. It follows cluster             {m}[V]. “m” will be called the measure of any word
based approach that uses features like cluster              or word part when represented in this form. The case
centroids, position etc., to summarize documents.           m = 0 covers the null word as follows:
          Recently, graph based models are being used                 m=0 TR, EE, TREE, Y, BY.
to represent text. They use measures like degree                      m=1 TROUBLE, OATS, TREES, IVY.
centrality [3] and eigenvector centrality [4] to rank                m=2        TROUBLES, PRIVATE, OATEN,
sentences. Inspired by PageRank [5], these methods          ORRERY.
build a network of sentences and then determine the                   The rules for removing a suffix will be given
importance of each sentence based on its connectivity       in the form (Condition) S1 -> S2.This means that if a
with other sentences. Highly ranked sentences form a        word ends with the suffix S and the stem before S1
summary.                                                    satisfies the given condition, S1 is replaced by S2.
          QTS proposes a novel algorithm that does          The condition is usually given in terms of m, e.g., (m
statistical processing to exploit the dependencies          > 1) EMENT -> .Here S1 is “EMENT” and S2 is
between sentences and generate a summary by                 null. This would map REPLACEMENT to REPLAC,
balancing query responsiveness in it.                       since REPLAC is a word part for which m = 2. The
                                                            “condition” part may also contain the following:
System models                                                        *S - the stem ends with S (and similarly for
URL Extraction And Sentence Extraction                      the other letters).
          Sentence extraction: A html file is given as              *v* - the stem contains a vowel.
input and paragraphs are extracted from this html file.             *d - the stem ends with a double consonant
These paragraphs are divided into sentences using           (e.g., -TT, -SS).
delimiters like ".", "!", "?" followed by a gap.                    *o - the stem ends with cvc, where the second
Stemming                                                    c is not W, X or Y
          Stemming is the process for reducing                        (e.g., – WIL,-HOP).
inflected (or sometimes derived) words to their stem,       Summarization Framework
base or root form – generally a written word form.          Contextual Graph
The stem need not be identical to the morphological                   All the non-textual elements like images,
root of the word; it is usually sufficient that related     tables etc are removed from the document .We use
words map to the same stem, even if this stem is not        “.” as a delimiter to segment the document into
in itself a valid root. The process of stemming, often      sentences and this document is represented as a
called conflation, is useful in search engines for query

                                                                                                 2273 | P a g e
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
contextual graph with sentences as nodes and                  with respect to a query term in two aspects: the
similarity between them as edges.                             relevance of sentence to the query term and the kind
                                                              of neighbors it is connected to. Initially each node is
Definition 1 Contextual Graph(CG): A contextual               initialized to query similarity weight and then these
graph for a document d is defined as a weighted               weights are spread to their neighbors via the weighted
graph, CG(d) = (V(d),E(d)) where V (d) is a finite set        graph IG. This process is repeated until the weights
of nodes where each node is a sentence in the                 come to a steady state. Following this idea, the node
document d and E(d) is a finite set of edges where an         weights for each node with respect to each query term
edge eij ∈ E(d) is incident on nodes i, j ∈ V (d) and is      qi ∈ Q where Q = {q1,q2,...,qt} are computed using
assigned a weight reflecting the degree of similarity         the following equation 1.
between nodes i and j. An edge exists only if its
weight exceeds a threshold μ. Edges connecting
adjacent sentences in a document are retained,
irrespective of the threshold.
          An edge weight w(eij) represents contextual
similarity between sentences si and sj . It is computed
using cosine similarity measure. The weight of each
term t is calculated using tft * isft where tft is the term   ……………………(1)
frequency and isft is inverse sentential frequency i.e.,      where wq (s) is node weight of node s with respect to
log( n/(nt+1)), where n is the total number of                query term q, d is bias factor, N is number of nodes
sentences in the document and nt is number of                 and sim(si,sj) is cosine similarity between sentences si
sentences containing the term t in the graph under            and sj. First part of equation computes relevancy of
consideration. Stop words are removed and remaining           nodes to the Query and second part considers
words are stemmed before computing these weights.             neighbors‟ node weights. The bias factor d gives
The edges that are having very low similarity value           trade-off between the set parts and is determined
i.e. with an edge weight below a threshold μ(=0.001)          empirically. For higher values of d, more importance
are discarded. These edges and their weights reflect          is given to the similarity of node to the query when
the degree of coherence in the summary.                       compared to the similarity between neighbors. The
                                                              denominators in both terms are for normalization.
Integrated Graph Generation                                   When a query Q is given to the system, each word is
          The input is the set of documents that are          assigned weight based on tf∗isf metric and node
related to the particular topic that we have to search        weights for each node with respect to each query term
in multi-document summarization. Let χ=(D,T) be a             are calculated. Intuitively a node will have a high
set of documents D on a topic T. For each document            score value if:
in this set χ are combined incrementally which forms          1) It has information relevant to the query and
a graph called as integrated graph. As these                  2) It is connected to similar context nodes which
documents are related to a topic T, they will be              share query relevant information.
similar and may contain some redundant sentences                        If a node doesn‟t have any query term but is
i.e., repeating sentences. These redundant sentences          linked to nodes having it, then the neighbor weights
are identified and only one of them is placed in the           are propagated in proportion to the edge weight such
integrated graph. Then the similarities between               that it gets a weight greater than zero. Thus high node
documents are identified by establishing edges across         weight indicates a highly relevant node present in a
nodes of different documents and the edge weights of          highly relevant context and is used to indicate the
IG are calculated. Thus the integrated graph reflects          richness of query specific information in the node.
inter as well as intra-document similarity
relationships present in document set χ. Algorithm            CTree AND SGraph
for integrated graph construction is given in later                     For each query word, the neighborhood of
sections.                                                     each node in IG is explored and a tree rooted at each
          Whenever a query related to the particular          node is constructed from the explored graph. These
topic T is given, relevancy scores of each node are           trees are called as contextual trees (CTrees).
computed with respect to each query term. During
this process, sentences that are not related to query         Definition 2
directly (by having terms), but are relevant are to be        Contextual Tree (CTree) : A CTreei =(Ni,Ei,r,qi ) is
considered which can be called as supporting                  defined as a quadruple where Ni and Ei are set of
sentences. To handle this, centrality based query             nodes and edges respectively. qi is ith term in the
biased sentence weights are computed that not only            query. It is rooted at r with at least one of the nodes
consider the local information i.e., whether the node         having the query term qi. Number of children for each
contains the query terms, but also global information         node is at most b (beam width). It has at most (1+ bd)
like the similarity with its adjacent nodes. A mixture        nodes where d is the maximum depth. CTree is
model is also used to define the importance of a node

                                                                                                    2274 | P a g e
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
empty if there is no node with query term qi with in       automatically generate query biased summaries from
depth d.                                                   text.
         CTrees corresponding to each query term
that are rooted at a particular node, are merged to        Integrated Graph Construction
form a summary graph (SGraph) which is defined as                    Integrated graph represents the relationships
follows:                                                   present among sentences of the input set. We assume
Definition 3                                                that the longest document contains more number of
Summary Graph (SGraph): For each node r in IG,             sub topics than any other document and its CG is
if there are t query terms, we construct a summary         chosen as a base graph and is added to IG which is
graph SGraph = (N‟, E‟, Q‟) where N‟ and E‟ are            empty initially. The documents in the input set are
union                                                      ordered in decreasing order of their size and CG‟s of
of the set of nodes and edges of CTreei rooted at r        each document in the ordered list is added
respectively and Q = {q1,q2,...,qt}.                       incrementally to IG . There are two important issues
                                                           that need to be addressed in multi-document
Scoring Model: CTrees formed from each node in             summarization.
IG are assigned a score that reflects the degree of                  Redundant sentences are identified as those
coherence and information richness in the tree.            sentences which have similarity that exceeds
Definition 4                                                threshold λ. This similarity is computed using cosine
                                                           similarity and λ =0.7 is sufficient in most of the cases.
CTreeScore: Given an integrated graph IG and a             During the construction of IG, if the sentence in
query term q, score of the CTree q rooted at node r is     consideration is found to be highly similar to any
calculated as                                              sentence of a document other than document being
                                                           considered in IG, then it is discarded. Otherwise it is
                                                           added as a new node and is connected to existing
                                                           nodes with which its similarity is above the threshold
                                                           µ.
                                                 .......            Sentence ordering in summary affects
(2)                                                        readability. For this purpose, an encoding strategy is
                                                           followed where an “id” is assigned to each node in IG
                                                           such that there is information flow in summary when
                …………… (3)                                  nodes are put in increasing order of their ids
         Here a is average of top three node weights
among the neighbors of u excluding parent of u and b       Encoding Strategy
is maximum edge weight among nodes incident on u.                    Initially all nodes in the base graph are
The SGraphs formed from each node by merging               assigned ids as follows. The ith sentence is assigned (i
CTrees for all query terms are ranked using following      − 1) ∗ η as id. This interval η is used to insert all the
equation and the highest ranked graph is retained as       nodes from other documents that are closer to i (i.e.,
summary.                                                   the inserted node has maximum edge weight with i
Definition 5                                                among all nodes adjacent to it). The sentences in an
SGraphScore: Given an integrated graph IG and a            interval are ordered in decreasing order of their edge
query Q = {q1,q2,...,qt}, score of the SGraph SG is        weights with i. When a new node is added to IG, an
calculated as                                              id is assigned to it. Pseudo code for IG construction is
                                                           given in Algorithm 1.

                                                           Algorithm 1 Integrated Graph Construction
                                                           1: Input: Contextual Graphs CGi in the decreasing
                                           ………             order of number of nodes
…………. (4)                                                  2: Output: Integrated graph IG
         The function size(SG) is defined as number         3: Integrated Graph IG = CG0 {//base graph}
of nodes in graph. Using both edge weights                 4: Set id of each node in IG as described in IG
representing contextual similarity and node weights        Construction
representing query relevance for selecting a node          5: i = 1
connected to root node, has never been tried before.       6: while i <= number of CG’s do
The summary graph construction is a novel approach         7: for each node n ∈ CGi considered in the document
which effectively achieves informativeness in a            order do
summary.                                                   8: if parent (n) precedes n in the ith document then
                                                           {//parent is the maximum      weighted adjacent node
Summarization Methodology                                  in CG}
       Based on the scoring model presented in the         9: Let p = node representing parent (n) in IG
above section, we design efficient algorithms to


                                                                                                 2275 | P a g e
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
10: if there is no neighbour x of p such that sim (n, x)    completeness is ensured as CTrees of all query terms
>  then                                                    are merged to form an SGraph and also as we are
11: for all y in IG, if sim (n, y) > μ then add an edge     merging CTrees rooted at a node, we will have
(n, y)                                                      interconnected set of sentences in the summary and
12: Set id of n as described in IG Construction             hence coherence is preserved. The SGraphs thus
13: end if                                                  formed are ranked based on the score computed as
14: else if there is no node x in IG such that sim (n, x)   given in Equation 4. Sentences from the highly
>  then                                                    ranked SGraph are returned as summary.
15: for all y in IG, if sim (n, y) > μ then add an
edge(n, y)                                                  Experimental Results
16: Set id of n as described in IG Construction                       In the experiments, QTS was compared with
17: end if                                                  two query specific systems - baseline and MEAD.
18: end for                                                 Our baseline system generates summaries by
19: i + +                                                   considering only centrality based node weights as per
20: end while                                               equation 1 using incremental graph, without
          If the input document set is singleton set,       generating CTrees and SGraphs. Nodes which have
then the integrated graph is equivalent to the              high weights are included in summary.          Second
contextual graph of that document. Addition of any          system, MEAD is a publicly available feature-based
new document to the set can be reflected in IG by            multidocument summarization toolkit. It computes a
adding its CG as described above and the edge               score for each sentence from a cluster of related
weights are updated accordingly. The integrated             documents, as a linear combination of several
graph for the set of documents can also be computed         features. For our experiments, we used centroid score,
offline and stored. When a query is posed on a               position and cosine similarity with query as features
document set, its IG can be loaded into memory and          with 1,1,10 as their weights respectively in MEAD
CTrees and SGraphs can be constructed as described          system. Maximal Marginal Relevance(MMR)
above.                                                      reranker provided in the MEAD toolkit
                                                                      Was used for redundancy removal with a
CTree Construction                                          similarity threshold as 0.6. Equal number of
         The neighborhood of a node is explored and         sentences as generated by QTS were extracted from
prominent nodes in it are included in CTree rooted at       the above two systems.
r. This exploration is done in breadth-first fashion.                  QTS used four criteria to evaluate the
                                                            generated summaries. They are non-redundacy,
Only b (beam width) prominent nodes are considered
                                                            responsiveness, coherence ad overall performance.
for further exploration at each level. The prominence
of a node j is determined by taking the weight of the       Summaries generated by three systems QTS, baseline
edge connecting j to its parent i and its node score        nd MEAD were evaluated by
with respect to the query term q into consideration. It               A group of 10 volunteers. They were given a
is computed as (αw(eij) + βwq (j)). These two factors       set of instructions defining the task and criteria and
specify the contribution of the node to the coherence       were asked to rate each summary on a scale of 1(bad)
of the summary and the amount of query related              to 10(good) for each criteria without actually seeing
information. α is the scaling factor defined in              the original documents. An average of these ratings
Equation 3. This scaling brings edge weights into the       for each query was computed and mean of them was
range of node weights and β determines trade-off            calculated. The graph shows that QTS performs better
                                                            when compared to other systems.On the whole QTS
between coherence and importance of query relevant
                                                            performed better than others in terms of user
information. The exploration from selected prominent
nodes (at most b) is continued until a level which has      satisfaction.
a node containing a query term (anchor node) or
maximum depth d is reached. All nodes along the             Conclusion
path from root to anchor node, along with their                       A novel framework for multi-document
siblings are added to the CTree. When query Term is         summarization system that generates a coherent and
not found till depth d then CTree for that query term       intelligible summary. We propose notion of an
remains empty. If a root node has the query term then       integrated graph that represents inherent structure
root and its adjacent “b” nodes are added to CTree          present in a set of related documents by removing
and no further exploration is done.                         redundant sentences. Our system generates query
                                                            term specific contextual trees (CTrees) which are
SGraph Construction                                         merged to form query specific summary graph
         The CTrees of the individual terms are             (SGraph). We have introduced an ordering strategy to
merged to form an SGraph. The SGraph at a node              order sentences in summary using integrated graph
contains all nodes and edges that appear in any one of      structure. This process of computation has indeed
the C Trees rooted at that node. With this,                 improved the quality of the summary. We


                                                                                                2276 | P a g e
J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.2272-2277
experimentally show that our approach is feasible and
is generating user satisfiable summaries.

References
  1.    M. Sravanthi, C. R. Chowdary, and P. S.
        Kumar. QTS: A query              specific text
        summarization system. In Proceedings of the
        21st International FLAIRS Conference,
        pages 219–224, Florida, USA, may 2008.
        AAAI Press
  2.    Radev, D. R.; Jing, H.; and Budzikowska,
        M. 2000.Centroid-based summarization of
        multiple documents: sentence extraction,
        utility-based evaluation, and user studies. In
        NAACL-ANLP 2000 Workshop on
        Automatic              summarization,21–30.
        Morristown, NJ, USA: Association for
        Computational Linguistics.
  3.    Salton, G.; Singhal, A.; Mitra, M.; and
        Buckley, C. 1997.Automatic text structuring
        and summarization. Inf. Process. Manage.
        33(2):193–207
  4.    Erkan, G., and Radev, D. R. 2004.
        LexPageRank: Prestige in multi-document
        text summarization. In EMNLP.
  5.    Fisher, S.; Roark, B.; Yang, J.; and Hersh,
        B.2005. OGI/OHSU Baseline Query
        directed Multidocument Summarization
        System for DUC-2005. Proceedings of the
        Document        Understanding      Conference
        (DUC).John M. Conroy, J. D. S., and
        Stewart, J. G. 2005.CLASSY query-based
        multi-document summarization.Proceedings
        of the Document Understanding Conference
        (DUC).
  7.    http://tartarus.org/~martin/PorterStemmer/




                                                                            2277 | P a g e

Mais conteúdo relacionado

Mais procurados

Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
 
IRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization MethodsIRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization MethodsIRJET Journal
 
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnIOSR Journals
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansWaqas Tariq
 
A Novel Text Classification Method Using Comprehensive Feature Weight
A Novel Text Classification Method Using Comprehensive Feature WeightA Novel Text Classification Method Using Comprehensive Feature Weight
A Novel Text Classification Method Using Comprehensive Feature WeightTELKOMNIKA JOURNAL
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning usingIJwest
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontologyIJwest
 
Data Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented ObjectsData Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented Objectsijsrd.com
 
Content Sharing over Smartphone-Based Delay-Tolerant Networks
Content Sharing over Smartphone-Based Delay-Tolerant NetworksContent Sharing over Smartphone-Based Delay-Tolerant Networks
Content Sharing over Smartphone-Based Delay-Tolerant NetworksIJERA Editor
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435IJRAT
 
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...IJCSIS Research Publications
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 

Mais procurados (20)

Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
IRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization MethodsIRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization Methods
 
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using Knn
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
A Novel Text Classification Method Using Comprehensive Feature Weight
A Novel Text Classification Method Using Comprehensive Feature WeightA Novel Text Classification Method Using Comprehensive Feature Weight
A Novel Text Classification Method Using Comprehensive Feature Weight
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Data Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented ObjectsData Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented Objects
 
Content Sharing over Smartphone-Based Delay-Tolerant Networks
Content Sharing over Smartphone-Based Delay-Tolerant NetworksContent Sharing over Smartphone-Based Delay-Tolerant Networks
Content Sharing over Smartphone-Based Delay-Tolerant Networks
 
Using lexical chains for text summarization
Using lexical chains for text summarizationUsing lexical chains for text summarization
Using lexical chains for text summarization
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435
 
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...
A Neighbourhood-Based Trust Protocol for Secure Collaborative Routing in Wire...
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed Clustering
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 

Destaque

tutorial de sociales -revolución francesa
tutorial de sociales -revolución francesatutorial de sociales -revolución francesa
tutorial de sociales -revolución francesaaandicult
 
Historia de la Creacion
Historia de la CreacionHistoria de la Creacion
Historia de la CreacionDrainsNoFuel
 
Diseño grafico
Diseño graficoDiseño grafico
Diseño graficoney
 
Presentación practica
Presentación practicaPresentación practica
Presentación practicaALEJA1792
 
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...Ignacio Fernández
 
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...Mariane Küter
 
Anuncio renault
Anuncio renaultAnuncio renault
Anuncio renaultrpelayo
 
E-Book TI Como Enabler da Competitividade E-Consulting Corp. 2010
 E-Book TI Como Enabler da Competitividade E-Consulting Corp.  2010 E-Book TI Como Enabler da Competitividade E-Consulting Corp.  2010
E-Book TI Como Enabler da Competitividade E-Consulting Corp. 2010E-Consulting Corp.
 
Programas das materias coperve
Programas das materias coperveProgramas das materias coperve
Programas das materias coperveGivaldo de Lima
 
Cuadro 1 Generalidades(1)
Cuadro 1 Generalidades(1)Cuadro 1 Generalidades(1)
Cuadro 1 Generalidades(1)Alan López
 

Destaque (20)

Og2423252330
Og2423252330Og2423252330
Og2423252330
 
Ny2422782281
Ny2422782281Ny2422782281
Ny2422782281
 
La Ciega
La CiegaLa Ciega
La Ciega
 
tutorial de sociales -revolución francesa
tutorial de sociales -revolución francesatutorial de sociales -revolución francesa
tutorial de sociales -revolución francesa
 
Historia de la Creacion
Historia de la CreacionHistoria de la Creacion
Historia de la Creacion
 
Diseño grafico
Diseño graficoDiseño grafico
Diseño grafico
 
Presentación practica
Presentación practicaPresentación practica
Presentación practica
 
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...
Fernández y Ananías (2006). Incentivos a largo plazo para la retención de ta...
 
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...
Conhecimento dos usuário de uma ubs sobre o papel e a importância do nutricio...
 
Anuncio renault
Anuncio renaultAnuncio renault
Anuncio renault
 
E-Book TI Como Enabler da Competitividade E-Consulting Corp. 2010
 E-Book TI Como Enabler da Competitividade E-Consulting Corp.  2010 E-Book TI Como Enabler da Competitividade E-Consulting Corp.  2010
E-Book TI Como Enabler da Competitividade E-Consulting Corp. 2010
 
5314
53145314
5314
 
5838
58385838
5838
 
6525
65256525
6525
 
El Proyecto De Investigacion
El Proyecto De InvestigacionEl Proyecto De Investigacion
El Proyecto De Investigacion
 
Conjuntos2
Conjuntos2Conjuntos2
Conjuntos2
 
Modi de tuo
Modi de tuoModi de tuo
Modi de tuo
 
Programas das materias coperve
Programas das materias coperveProgramas das materias coperve
Programas das materias coperve
 
Cuadro 1 Generalidades(1)
Cuadro 1 Generalidades(1)Cuadro 1 Generalidades(1)
Cuadro 1 Generalidades(1)
 
6519
65196519
6519
 

Semelhante a Nx2422722277

Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Editor IJARCET
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPIRJET Journal
 
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONEXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONIJNSA Journal
 
Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...IJECEIAES
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONcscpconf
 
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWWEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWijcseit
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWWEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWijcseit
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...ijcseit
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingIRJET Journal
 
Semantic web service discovery approaches
Semantic web service discovery approachesSemantic web service discovery approaches
Semantic web service discovery approachesIJCSES Journal
 
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM IAEME Publication
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET Journal
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 

Semelhante a Nx2422722277 (20)

Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362Ijarcet vol-2-issue-4-1357-1362
Ijarcet vol-2-issue-4-1357-1362
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
 
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONEXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
 
Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
 
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWWEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEWWEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
WEB SERVICES COMPOSITION METHODS AND TECHNIQUES: A REVIEW
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language Processing
 
Semantic web service discovery approaches
Semantic web service discovery approachesSemantic web service discovery approaches
Semantic web service discovery approaches
 
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
MULTI-DOCUMENT SUMMARIZATION SYSTEM: USING FUZZY LOGIC AND GENETIC ALGORITHM
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 
G04124041046
G04124041046G04124041046
G04124041046
 
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 

Nx2422722277

  • 1. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 Uncertainty Based Text Summarization J.Ramachandra1 J.Ravi kumar2 Y.V.Sricharan3 Y.Venkata Sreekar4 1 2nd Year M.Tech, CREC, Tirupati 2 Professor of Department of CSE, CREC, Tirupati Abstract Effective extraction of query relevant Moreover, information pertaining to a query might be information present within the webpages is a distributed across several sources. So, it is a difficult nontrivial task. QTS, task by filtering and thing for a user to sift through all these documents aggregating important query relevant sentences and find the information that it needs. Neither web distributed across the webpages. This system directories nor search engines are helpful in this captures the contextual relationships among matter. Hence it would be very useful to have a sentences of these webpages and represents them system which could filter and aggregate information as an “integrated graph”. These relationships are relevant to user‟s query from various sources and exploited and several subgraphs of integrated present it as a digest or a summary. This summary graph which consist of sentences that are highly would help in getting an overall understanding of the relevant to the query and that are highly related to query. each other are constructed. These subgraphs are Automatic Query Specific Multi-document ranked by the scoring model. The subgraph with Summarization is the process of filtering important highest rank which is rich in query relevant relevant information from the input set of documents information is returned as a query specific and presenting the concise version of that documents summary. QTS is domain independent and to the user. QTS fulfills this objective by generating a doesn’t use any linguistic processing, making it a summary that is specific to the given query on a set of flexible and general approach that can be applied documents. Here we focus on building a coherent and to unrestricted set of articles found on WWW. highly responsive multi-document summary which is Experimental results prove the strength of complete. This process poses significant challenges QTS.Very little work is reported on query specific like maintaining intelligibility, coherence and non- multiple document summarization.The quality of redundancy. summaries generated by QTS is better than Intelligibility/responsiveness it is the MEAD(one of the popular summarizers). property that determines if the summary satisfies user‟s needs or not while Coherence determines the Introduction readability and information flow. As user‟s query in with the increased usage of computers huge the context of a search engine is small (typically 2 or amounts of data is being added to web constantly. 3 words), we need to appropriately select important This data is stored either on personal computers or relevant sentences. Some sentences may not contain data servers accessible through Internet. The query terms but can still be relevant. Not selecting increased disk space and popularity of Internet has these sentences will affect the intelligibility of resulted in vast repositories of data available at summary. Moreover, selecting sentences while fingertips of a computer user resulting in summarizing long documents without considering the “Information Overload” problem. The World Wide context in which it appears will result in an Web has become the largest source of information incoherent text. Specific to multi-document with heterogeneous collections of data. A user in summarization, there are problems of redundancy in need of some information is lost in the overwhelming input text and the ordering of selected sentences in amounts of data. the summary. Search engines try to organize web Summarization process proceeds in three dynamically by identifying, retrieving and presenting stages - source representation, sentence filtering and web pages relevant to users search query. Clustering ordering. In QTS system, the focus is on generating search engines like cluster go a step further to summaries by a novel way of representing the input organize the retrieved search results list into documents, identifying important sentences which are categories and allow user to do a focused search. not redundant and presenting them as summary. The A web user seeking information on a broad intention is to represent the contextual dependencies topic will either visit a search engine and type in a present between sentences as a graph by connecting a query or browse through the web directory. In both sentence to another sentence if they are contextually these cases, the number of web pages that are similar and highlight how these relationships can be retrieved to satisfy user‟s need is very high. exploited to score and select sentences. 2272 | P a g e
  • 2. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 IG is built to reflect inter and intra document expansion or indexing and other natural language contextual similarities present between sentences of processing problems. the input documents. It is highly probable that the In this project we used Porter Stemmer Algorithm. neighborhood of a node in this graph contains The Porter stemming [7] algorithm (or „Porter sentences that have similar content. So, from each stemmer’) is a process for removing the commoner node in this graph by exploiting its neighborhood, a morphological endings from words in English. Its summary graph called SGraph is built which will main use is as part of a term normalization process contain query relevant information. that is usually done when setting up Information This approach is domain independent and Retrieval systems. doesn‟t use any linguistic processing, making it a flexible and general approach that can be applied to THE PORTER STEMMER ALGORITHM unrestricted set of articles. We have implemented A consonant in a word is a letter other than QTS and it works in two phases. First phase deals A, E, I, O or U and other than Y preceded by a with creation of IG and necessary indexes that are consonant. (The fact that the term `consonant' is query independent and can be done offline. defined to some extent in terms of itself does not Second phase deals with generation of make it ambiguous.) So in TOY the consonants are T summary relevant to the user‟s query from the IG. and Y and in SYZYGY they are S, Z and G. If a letter First phase is a time consuming process especially for is not a consonant it is a vowel. sets of documents with large number of sentences. A consonant will be denoted by c and a So, this system can be used on a desktop machine in a vowel by v. A group of consonants of length greater digital library or in an intranet kind of an than zero will be denoted by C, and a group of vowels environment where documents can be clustered and of length greater than zero will be denoted by V. Any graphs can be created .We present experimental word, or part of a word, therefore has one of the four results that prove the strength of QTS. forms: CVCV ... C, CVCV ... V, VCVC ... C, VCVC Related work ... V. These may all be represented by the single form Several clustering based approaches were [C] VCVC ... [V]. Here the square brackets denote tried where similar sentences are clustered and a arbitrary presence of their contents. Using (VC) {m} representative sentence of each cluster is chosen as a to denote VC which is repeated m times, this may summary. MEAD [2] is a centroid based multi- again be written as [C](VC) document generic summarizer. It follows cluster {m}[V]. “m” will be called the measure of any word based approach that uses features like cluster or word part when represented in this form. The case centroids, position etc., to summarize documents. m = 0 covers the null word as follows: Recently, graph based models are being used m=0 TR, EE, TREE, Y, BY. to represent text. They use measures like degree m=1 TROUBLE, OATS, TREES, IVY. centrality [3] and eigenvector centrality [4] to rank m=2 TROUBLES, PRIVATE, OATEN, sentences. Inspired by PageRank [5], these methods ORRERY. build a network of sentences and then determine the The rules for removing a suffix will be given importance of each sentence based on its connectivity in the form (Condition) S1 -> S2.This means that if a with other sentences. Highly ranked sentences form a word ends with the suffix S and the stem before S1 summary. satisfies the given condition, S1 is replaced by S2. QTS proposes a novel algorithm that does The condition is usually given in terms of m, e.g., (m statistical processing to exploit the dependencies > 1) EMENT -> .Here S1 is “EMENT” and S2 is between sentences and generate a summary by null. This would map REPLACEMENT to REPLAC, balancing query responsiveness in it. since REPLAC is a word part for which m = 2. The “condition” part may also contain the following: System models *S - the stem ends with S (and similarly for URL Extraction And Sentence Extraction the other letters). Sentence extraction: A html file is given as *v* - the stem contains a vowel. input and paragraphs are extracted from this html file. *d - the stem ends with a double consonant These paragraphs are divided into sentences using (e.g., -TT, -SS). delimiters like ".", "!", "?" followed by a gap. *o - the stem ends with cvc, where the second Stemming c is not W, X or Y Stemming is the process for reducing (e.g., – WIL,-HOP). inflected (or sometimes derived) words to their stem, Summarization Framework base or root form – generally a written word form. Contextual Graph The stem need not be identical to the morphological All the non-textual elements like images, root of the word; it is usually sufficient that related tables etc are removed from the document .We use words map to the same stem, even if this stem is not “.” as a delimiter to segment the document into in itself a valid root. The process of stemming, often sentences and this document is represented as a called conflation, is useful in search engines for query 2273 | P a g e
  • 3. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 contextual graph with sentences as nodes and with respect to a query term in two aspects: the similarity between them as edges. relevance of sentence to the query term and the kind of neighbors it is connected to. Initially each node is Definition 1 Contextual Graph(CG): A contextual initialized to query similarity weight and then these graph for a document d is defined as a weighted weights are spread to their neighbors via the weighted graph, CG(d) = (V(d),E(d)) where V (d) is a finite set graph IG. This process is repeated until the weights of nodes where each node is a sentence in the come to a steady state. Following this idea, the node document d and E(d) is a finite set of edges where an weights for each node with respect to each query term edge eij ∈ E(d) is incident on nodes i, j ∈ V (d) and is qi ∈ Q where Q = {q1,q2,...,qt} are computed using assigned a weight reflecting the degree of similarity the following equation 1. between nodes i and j. An edge exists only if its weight exceeds a threshold μ. Edges connecting adjacent sentences in a document are retained, irrespective of the threshold. An edge weight w(eij) represents contextual similarity between sentences si and sj . It is computed using cosine similarity measure. The weight of each term t is calculated using tft * isft where tft is the term ……………………(1) frequency and isft is inverse sentential frequency i.e., where wq (s) is node weight of node s with respect to log( n/(nt+1)), where n is the total number of query term q, d is bias factor, N is number of nodes sentences in the document and nt is number of and sim(si,sj) is cosine similarity between sentences si sentences containing the term t in the graph under and sj. First part of equation computes relevancy of consideration. Stop words are removed and remaining nodes to the Query and second part considers words are stemmed before computing these weights. neighbors‟ node weights. The bias factor d gives The edges that are having very low similarity value trade-off between the set parts and is determined i.e. with an edge weight below a threshold μ(=0.001) empirically. For higher values of d, more importance are discarded. These edges and their weights reflect is given to the similarity of node to the query when the degree of coherence in the summary. compared to the similarity between neighbors. The denominators in both terms are for normalization. Integrated Graph Generation When a query Q is given to the system, each word is The input is the set of documents that are assigned weight based on tf∗isf metric and node related to the particular topic that we have to search weights for each node with respect to each query term in multi-document summarization. Let χ=(D,T) be a are calculated. Intuitively a node will have a high set of documents D on a topic T. For each document score value if: in this set χ are combined incrementally which forms 1) It has information relevant to the query and a graph called as integrated graph. As these 2) It is connected to similar context nodes which documents are related to a topic T, they will be share query relevant information. similar and may contain some redundant sentences If a node doesn‟t have any query term but is i.e., repeating sentences. These redundant sentences linked to nodes having it, then the neighbor weights are identified and only one of them is placed in the are propagated in proportion to the edge weight such integrated graph. Then the similarities between that it gets a weight greater than zero. Thus high node documents are identified by establishing edges across weight indicates a highly relevant node present in a nodes of different documents and the edge weights of highly relevant context and is used to indicate the IG are calculated. Thus the integrated graph reflects richness of query specific information in the node. inter as well as intra-document similarity relationships present in document set χ. Algorithm CTree AND SGraph for integrated graph construction is given in later For each query word, the neighborhood of sections. each node in IG is explored and a tree rooted at each Whenever a query related to the particular node is constructed from the explored graph. These topic T is given, relevancy scores of each node are trees are called as contextual trees (CTrees). computed with respect to each query term. During this process, sentences that are not related to query Definition 2 directly (by having terms), but are relevant are to be Contextual Tree (CTree) : A CTreei =(Ni,Ei,r,qi ) is considered which can be called as supporting defined as a quadruple where Ni and Ei are set of sentences. To handle this, centrality based query nodes and edges respectively. qi is ith term in the biased sentence weights are computed that not only query. It is rooted at r with at least one of the nodes consider the local information i.e., whether the node having the query term qi. Number of children for each contains the query terms, but also global information node is at most b (beam width). It has at most (1+ bd) like the similarity with its adjacent nodes. A mixture nodes where d is the maximum depth. CTree is model is also used to define the importance of a node 2274 | P a g e
  • 4. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 empty if there is no node with query term qi with in automatically generate query biased summaries from depth d. text. CTrees corresponding to each query term that are rooted at a particular node, are merged to Integrated Graph Construction form a summary graph (SGraph) which is defined as Integrated graph represents the relationships follows: present among sentences of the input set. We assume Definition 3 that the longest document contains more number of Summary Graph (SGraph): For each node r in IG, sub topics than any other document and its CG is if there are t query terms, we construct a summary chosen as a base graph and is added to IG which is graph SGraph = (N‟, E‟, Q‟) where N‟ and E‟ are empty initially. The documents in the input set are union ordered in decreasing order of their size and CG‟s of of the set of nodes and edges of CTreei rooted at r each document in the ordered list is added respectively and Q = {q1,q2,...,qt}. incrementally to IG . There are two important issues that need to be addressed in multi-document Scoring Model: CTrees formed from each node in summarization. IG are assigned a score that reflects the degree of Redundant sentences are identified as those coherence and information richness in the tree. sentences which have similarity that exceeds Definition 4 threshold λ. This similarity is computed using cosine similarity and λ =0.7 is sufficient in most of the cases. CTreeScore: Given an integrated graph IG and a During the construction of IG, if the sentence in query term q, score of the CTree q rooted at node r is consideration is found to be highly similar to any calculated as sentence of a document other than document being considered in IG, then it is discarded. Otherwise it is added as a new node and is connected to existing nodes with which its similarity is above the threshold µ. ....... Sentence ordering in summary affects (2) readability. For this purpose, an encoding strategy is followed where an “id” is assigned to each node in IG such that there is information flow in summary when …………… (3) nodes are put in increasing order of their ids Here a is average of top three node weights among the neighbors of u excluding parent of u and b Encoding Strategy is maximum edge weight among nodes incident on u. Initially all nodes in the base graph are The SGraphs formed from each node by merging assigned ids as follows. The ith sentence is assigned (i CTrees for all query terms are ranked using following − 1) ∗ η as id. This interval η is used to insert all the equation and the highest ranked graph is retained as nodes from other documents that are closer to i (i.e., summary. the inserted node has maximum edge weight with i Definition 5 among all nodes adjacent to it). The sentences in an SGraphScore: Given an integrated graph IG and a interval are ordered in decreasing order of their edge query Q = {q1,q2,...,qt}, score of the SGraph SG is weights with i. When a new node is added to IG, an calculated as id is assigned to it. Pseudo code for IG construction is given in Algorithm 1. Algorithm 1 Integrated Graph Construction 1: Input: Contextual Graphs CGi in the decreasing ……… order of number of nodes …………. (4) 2: Output: Integrated graph IG The function size(SG) is defined as number 3: Integrated Graph IG = CG0 {//base graph} of nodes in graph. Using both edge weights 4: Set id of each node in IG as described in IG representing contextual similarity and node weights Construction representing query relevance for selecting a node 5: i = 1 connected to root node, has never been tried before. 6: while i <= number of CG’s do The summary graph construction is a novel approach 7: for each node n ∈ CGi considered in the document which effectively achieves informativeness in a order do summary. 8: if parent (n) precedes n in the ith document then {//parent is the maximum weighted adjacent node Summarization Methodology in CG} Based on the scoring model presented in the 9: Let p = node representing parent (n) in IG above section, we design efficient algorithms to 2275 | P a g e
  • 5. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 10: if there is no neighbour x of p such that sim (n, x) completeness is ensured as CTrees of all query terms >  then are merged to form an SGraph and also as we are 11: for all y in IG, if sim (n, y) > μ then add an edge merging CTrees rooted at a node, we will have (n, y) interconnected set of sentences in the summary and 12: Set id of n as described in IG Construction hence coherence is preserved. The SGraphs thus 13: end if formed are ranked based on the score computed as 14: else if there is no node x in IG such that sim (n, x) given in Equation 4. Sentences from the highly >  then ranked SGraph are returned as summary. 15: for all y in IG, if sim (n, y) > μ then add an edge(n, y) Experimental Results 16: Set id of n as described in IG Construction In the experiments, QTS was compared with 17: end if two query specific systems - baseline and MEAD. 18: end for Our baseline system generates summaries by 19: i + + considering only centrality based node weights as per 20: end while equation 1 using incremental graph, without If the input document set is singleton set, generating CTrees and SGraphs. Nodes which have then the integrated graph is equivalent to the high weights are included in summary. Second contextual graph of that document. Addition of any system, MEAD is a publicly available feature-based new document to the set can be reflected in IG by multidocument summarization toolkit. It computes a adding its CG as described above and the edge score for each sentence from a cluster of related weights are updated accordingly. The integrated documents, as a linear combination of several graph for the set of documents can also be computed features. For our experiments, we used centroid score, offline and stored. When a query is posed on a position and cosine similarity with query as features document set, its IG can be loaded into memory and with 1,1,10 as their weights respectively in MEAD CTrees and SGraphs can be constructed as described system. Maximal Marginal Relevance(MMR) above. reranker provided in the MEAD toolkit Was used for redundancy removal with a CTree Construction similarity threshold as 0.6. Equal number of The neighborhood of a node is explored and sentences as generated by QTS were extracted from prominent nodes in it are included in CTree rooted at the above two systems. r. This exploration is done in breadth-first fashion. QTS used four criteria to evaluate the generated summaries. They are non-redundacy, Only b (beam width) prominent nodes are considered responsiveness, coherence ad overall performance. for further exploration at each level. The prominence of a node j is determined by taking the weight of the Summaries generated by three systems QTS, baseline edge connecting j to its parent i and its node score nd MEAD were evaluated by with respect to the query term q into consideration. It A group of 10 volunteers. They were given a is computed as (αw(eij) + βwq (j)). These two factors set of instructions defining the task and criteria and specify the contribution of the node to the coherence were asked to rate each summary on a scale of 1(bad) of the summary and the amount of query related to 10(good) for each criteria without actually seeing information. α is the scaling factor defined in the original documents. An average of these ratings Equation 3. This scaling brings edge weights into the for each query was computed and mean of them was range of node weights and β determines trade-off calculated. The graph shows that QTS performs better when compared to other systems.On the whole QTS between coherence and importance of query relevant performed better than others in terms of user information. The exploration from selected prominent nodes (at most b) is continued until a level which has satisfaction. a node containing a query term (anchor node) or maximum depth d is reached. All nodes along the Conclusion path from root to anchor node, along with their A novel framework for multi-document siblings are added to the CTree. When query Term is summarization system that generates a coherent and not found till depth d then CTree for that query term intelligible summary. We propose notion of an remains empty. If a root node has the query term then integrated graph that represents inherent structure root and its adjacent “b” nodes are added to CTree present in a set of related documents by removing and no further exploration is done. redundant sentences. Our system generates query term specific contextual trees (CTrees) which are SGraph Construction merged to form query specific summary graph The CTrees of the individual terms are (SGraph). We have introduced an ordering strategy to merged to form an SGraph. The SGraph at a node order sentences in summary using integrated graph contains all nodes and edges that appear in any one of structure. This process of computation has indeed the C Trees rooted at that node. With this, improved the quality of the summary. We 2276 | P a g e
  • 6. J.Ramachandra, J.Ravi kumar,Y.V.Sricharan,Y.Venkata Sreekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.2272-2277 experimentally show that our approach is feasible and is generating user satisfiable summaries. References 1. M. Sravanthi, C. R. Chowdary, and P. S. Kumar. QTS: A query specific text summarization system. In Proceedings of the 21st International FLAIRS Conference, pages 219–224, Florida, USA, may 2008. AAAI Press 2. Radev, D. R.; Jing, H.; and Budzikowska, M. 2000.Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In NAACL-ANLP 2000 Workshop on Automatic summarization,21–30. Morristown, NJ, USA: Association for Computational Linguistics. 3. Salton, G.; Singhal, A.; Mitra, M.; and Buckley, C. 1997.Automatic text structuring and summarization. Inf. Process. Manage. 33(2):193–207 4. Erkan, G., and Radev, D. R. 2004. LexPageRank: Prestige in multi-document text summarization. In EMNLP. 5. Fisher, S.; Roark, B.; Yang, J.; and Hersh, B.2005. OGI/OHSU Baseline Query directed Multidocument Summarization System for DUC-2005. Proceedings of the Document Understanding Conference (DUC).John M. Conroy, J. D. S., and Stewart, J. G. 2005.CLASSY query-based multi-document summarization.Proceedings of the Document Understanding Conference (DUC). 7. http://tartarus.org/~martin/PorterStemmer/ 2277 | P a g e