User friendly pattern search paradigm

A User-friendly Patent Search Paradigm

INTRODUCTION
Patents play a very important role in intellectual
property protection. As patent search can help the patent
examiners to find previously published relevant patents
and validate or invalidate new patent applications, it has
become more and more popular, and recently attracts
much attention from both industrial and academic
communities. For example, there are many online
systems to support patent search, such as Google patent
search, Derwent Innovations Index (DII), and USPTO.
As most patent-search users have limited knowledge
about the underlying patents, they have to use a try-and
see approach to repeatedly issue queries and check
answers, which is a very tedious process.

ABSTRACT
As most patent-search users have limited knowledge about the
underlying patents, they have to use a try-and see approach to
repeatedly issue queries and check answers, which is a very
tedious process. To overcome this, our proposed system
introduces the efficient patent search paradigm. Our project
can help users find relevant patents more easily and improve
user search experience. To overcome the typing error problem
in existing system our project introduces the error correction
technique. Our project proposes three effective techniques,
error correction, Topic-based query suggestion, and query
expansion, to improve the usability of patent search. For
improving efficiency partition the patents into small partitions
based to their topics and classes. Then given a query and find
highly relevant partitions and answer the query in each of such
highly relevant partitions. Finally combine the answers of each
partition and generate top answers of the patent-search query.

SCOPE OF THE PROJECT:
In this project we improve the search efficiency.
And we provide the more suggestions for user to
check the patents. We correct the errors in the
search keywords using the query correction
methods.

LITERATURE SURVEY:
Title: Improving Retrievability of Patents in Prior-Art Search
Authors: S. Bashir and A. Rauber
Year: 2010
Description
Prior-art search is an important task in patent retrieval. The success of this task relies
upon the selection of relevant search queries. Typically terms for prior-art queries are
extracted from the claim fields of query patents. However, due to the complex technical
structure of patents, and presence of terms mismatch and vague terms, selecting
relevant terms for queries is a difficult task. During evaluating the patents retrievability
coverage of prior-art queries generated from query patents, a large bias toward a subset
of the collection is experienced. A large number of patents either have a very low
retrievability score or cannot be discovered via any query. To increase the retrievability
of patents, in this paper we expand prior-art queries generated from query patents using
query expansion with pseudo relevance feedback. Missing terms from query patents are
discovered from feedback patents, and better patents for relevance feedback are
identified using a novel approach for checking their similarity with query patents. We
specifically focus on how to automatically select better terms from query patents based
on their proximity distribution with prior-art queries that are used as features for
computing similarity. Our results show, that the coverage of prior-art queries can be
increased significantly by incorporating relevant queries terms using query expansion.

Title: Latent dirichlet allocation
Authors: D. M. Blei, A. Y. Ng, and M. I. Jordan
Year: 2003
Description
We describe latent Dirichlet allocation (LDA), a
generative probabilistic model for collections of
discrete data such as text corpora. LDA is a three-level
hierarchical Bayesian model, in which each item of a
collection is modeled as a finite mixture over an
underlying set of topics. Each topic is, in turn, modeled
as an infinite mixture over an underlying set of topic
probabilities. In the context of text modeling, the topic
probabilities provide an explicit representation of a
document. We present efficient approximate inference
techniques based on variational methods and an EM
algorithm for empirical Bayes parameter estimation. We
report results in document modeling, text classification,
and collaborative filtering, comparing to a mixture of
unigrams model and the probabilistic LSI model

Title: Suggesting Topic-Based Query Terms as You Type
Authors: J. Fan, H. Wu, G. Li, and L. Zhou
Year: 2010
Description
Query term suggestion that interactively expands the queries is an
indispensable technique to help users formulate high-quality queries and has
attracted much attention in the community of web search. Existing methods
usually suggest terms based on statistics in documents as well as query logs
and external dictionaries, and they neglect the fact that the topic information
is very crucial because it helps retrieve topically relevant documents. To give
users gratification, we propose a novel term suggestion method: as the user
types in queries letter by letter, we suggest the terms that are topically
coherent with the query and could retrieve relevant documents instantly. For
effectively suggesting highly relevant terms, we propose a generative model
by incorporating the topical coherence of terms. The model learns the topics
from the underlying documents based on Latent Dirichlet Allocation (LDA).
For achieving the goal of instant query suggestion, we use a trie structure to
index and access terms. We devise an efficient top-k algorithm to suggest
terms as users type in queries. Experimental results show that our approach
not only improves the effectiveness of term suggestion, but also achieves
better efficiency and scalability.

Title: Ranking structured documents: a large margin based
approach for patent prior art search
Authors: Y. Guo and C. P. Gomes
Year: 2009
Description
We propose an approach for automatically ranking structured
documents applied to patent prior art search. Our model, SVM
Patent Ranking (SVMPR) incorporates margin constraints that
directly capture the specificities of patent citation ranking. Our
approach combines patent domain knowledge features with
meta-score features from several different general Information
Retrieval methods. The training algorithm is an extension of
the Pegasos algorithm with performance guarantees,
effectively handling hundreds of thousands of patent-pair
judgments in a high dimensional feature space. Experiments on
a homogeneous essential wireless patent dataset show that
SVMPRperforms on average 30%-40% better than many other
state-of-the-art general-purpose Information Retrieval methods
in terms of the NDCG measure at different cut-off positions.

Title: Efficient interactive fuzzy keyword search
Authors: S. Ji, G. Li, C. Li, and J. Feng
Year: 2009
Description
Traditional information systems return answers after a user submits a complete query.
Users often feel "left in the dark" when they have limited knowledge about the
underlying data, and have to use a try-and-see approach for finding information. A recent
trend of supporting auto complete in these systems is a first step towards solving this
problem. In this paper, we study a new information-access paradigm, called "interactive,
fuzzy search," in which the system searches the underlying data "on the fly" as the user
types in query keywords. It extends auto complete interfaces by (1) allowing keywords to
appear in multiple attributes (in an arbitrary order) of the underlying data; and (2) finding
relevant records that have keywords matching query keywords approximately. This
framework allows users to explore data as they type, even in the presence of minor
errors. We study research challenges in this framework for large amounts of data. Since
each keystroke of the user could invoke a query on the backend, we need efficient
algorithms to process each query within milliseconds. We develop various incremental-
search algorithms using previously computed and cached results in order to achieve an
interactive speed. We have deployed several real prototypes using these techniques. One
of them has been deployed to support interactive search on the UC Irvine people
directory, which has been used regularly and well received by users due to its friendly
interface and high efficiency.

Title: Efficient Merging and Filtering Algorithms for Approximate String
Searches
Authors: C. Li, J. Lu, and Y. Lu
Year: 2008
Description
We study the following problem: how to efficiently find in a
collection of strings those similar to a given query string? Various
similarity functions can be used, such as edit distance, Jaccard
similarity, and cosine similarity. This problem is of great interests to a
variety of applications that need a high real-time performance, such as
data cleaning, query relaxation, and spellchecking. Several algorithms
have been proposed based on the idea of merging inverted lists of
grams generated from the strings. In this paper we make two
contributions. First, we develop several algorithms that can greatly
improve the performance of existing algorithms. Second, we study
how to integrate existing filtering techniques with these algorithms,
and show that they should be used together judiciously, since the way
to do the integration can greatly affects the performance. We have
conducted experiments on several real data sets to evaluate the
proposed techniques.

Title: Supporting Search-As-You-Type Using SQL in Databases
Authors: G. Li, J. Feng, and C. Li
Year: 2011
Description
A search-as-you-type system computes answers on-the-fly as a user
types in a keyword query letter by letter. We study how to support
search-as-you-type on data residing in a relational DBMS. We focus
on how to support this type of search using the native database
language, SQL. A main challenge is how to leverage existing
database functionalities to meet the high-performance requirement to
achieve an interactive speed. We study how to use auxiliary indexes
stored as tables to increase search performance. We present solutions
for both single-keyword queries and multi-keyword queries, and
develop novel techniques for fuzzy search using SQL by allowing
mismatches between query keywords and answers. We present
techniques to answer first-N queries and discuss how to support
updates efficiently. Experiments on large, real data sets show that our
techniques enable DBMS systems on a commodity computer to
support search-as-you-type on tables with millions of records.

Title: Efficient fuzzy full-text type-ahead search
Authors: G. Li, S. Ji, C. Li, and J. Feng
Year: 2011
Description
Traditional information systems return answers after a user submits a complete query.
Users often feel "left in the dark" when they have limited knowledge about the
underlying data and have to use a try-and-see approach for finding information. A
recent trend of supporting auto complete in these systems is a first step toward solving
this problem. In this paper, we study a new information-access paradigm, called "type-
ahead search" in which the system searches the underlying data "on the fly" as the user
types in query keywords. It extends auto complete interfaces by allowing keywords to
appear at different places in the underlying data. This framework allows users to
explore data as they type, even in the presence of minor errors. We study research
challenges in this framework for large amounts of data. Since each keystroke of the
user could invoke a query on the backend, we need efficient algorithms to process each
query within milliseconds. We develop various incremental-search algorithms for both
single-keyword queries and multi-keyword queries, using previously computed and
cached results in order to achieve a high interactive speed. We develop novel
techniques to support fuzzy search by allowing mismatches between query keywords
and answers. We have deployed several real prototypes using these techniques. One of
them has been deployed to support type-ahead search on the UC Irvine people
directory, which has been used regularly and well received by users due to its friendly
interface and high efficiency.

Title: EASE: an effective 3-in-1 keyword search method for
unstructured, semi-structured and structured data
Authors: G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou
Year: 2008
Description
Conventional keyword search engines are restricted to a given data
model and cannot easily adapt to unstructured, semi-structured or
structured data. In this paper, we propose an efficient and adaptive
keyword search method, called EASE, for indexing and querying
large collections of heterogeneous data. To achieve high efficiency in
processing keyword queries, we first model unstructured, semi-
structured and structured data as graphs, and then summarize the
graphs and construct graph indices instead of using traditional
inverted indices. We propose an extended inverted index to facilitate
keyword-based search, and present a novel ranking mechanism for
enhancing search effectiveness. We have conducted an extensive
experimental study using real datasets, and the results show that
EASE achieves both high search efficiency and high accuracy, and
outperforms the existing approaches significantly

Title: Simple vs. sophisticated approaches for patent prior-art search
Authors: W. Magdy, P. Lopez, and G. J. F. Jones
Year: 2011
Description
Patent prior-art search is concerned with finding all filed patents
relevant to a given patent application. We report a comparison
between two search approaches representing the state-of-the-art in
patent prior-art search. The first approach uses simple and
straightforward information retrieval (IR) techniques, while the
second uses much more sophisticated techniques which try to model
the steps taken by a patent examiner in patent search. Experiments
show that the retrieval effectiveness using both techniques is
statistically indistinguishable when patent applications contain some
initial citations. However, the advanced search technique is
statistically better when no initial citations are provided. Our findings
suggest that less time and effort can be exerted by applying simple IR
approaches when initial citations are provided.

Modules:
1. Login page
2. Client Search through query
2.1 Automatic Error correction
2.2 Topic based query suggestion
2.3 Query expansion
3. Ranking
4. Patent Partition selection
5. Query Processing

Module Description
1. Login page
Before client creation we check the user credential here by
login page, we receive the username and password by the user
and we will check in the database is that user have the
credential or not to give request to the server. Here also we can
add new user through user registration by taking all the
important details like user’s name, gender, username, password,
address, email id, phone no from the user.
In this module first we design the page for getting the
user’s query then we will write the code in java file and through
jsp file we will take the user’s query request to the semantic
storage.

In the automatic error correction we are using trie
structure to do efficient keyword correction and
completion. We are considering the prefix of the query
word .if it is not familiar with the trie node then we
don’t want to consider that keyword.
The topic based model is estimating the probability
of the next query keyword. If a keyword in patents is
more topically coherent with the previously typed query
word it will be getting the higher score.
2.3 Query expansion
In the query expansion we will be using the search
engine for suggesting the relevant keyword. And we are
using the relevant keywords from the query log for the
expansion purpose.

3. Ranking
In this module we are ranking the answers that are
obtained for our query search by the probability of
most relevant patent. We are finding the most relevant
patent regarding with the patent search.
In this module we are selecting the partition regarding
with our patent search using two relevancy .That is
topic relevancy and keyword relevancy. Using these
two relevancy we are finding the top relevant
partitions.
5. Query Processing
Query processing module is for find the top answers
regarding with our search. In this process we are
combining all the ranking and selected partitions for
finding the top answer.

Module Diagram
1. Login page
User Login Page
Database
Patent search
page

User
Typing
Query
Error
Corrected
Query

GIVEN INPUT EXPECTED OUTPUT
1. Login page
Input: User name and Password
Output: Application transferred to the Patent search engine
Input: Enters the patent keyword which has to search
Output: Query shown in search place
Input: Enters the patent which has to search
Output: Error corrected Patent keyword
Output: Suggestions regarding with the topic
2.3 Query expansion
Output: Query keyword with relevant expanded format

3. Ranking
Output: : Patent will be selected using ranking
Output: Partitions searched topic based and keyword
based
5. Query Processing
Output: Aggregated And Ranked top answers

SYSTEM REQUIREMENTS
HARDWARE
PROCESSOR : PENTIUM IV 2.6 GHz, Intel Core 2 Duo.
RAM : 512 MB DD RAM
MONITOR : 15” COLOR
HARD DISK : 40 GB
CDDRIVE : LG 52X
SOFTWARE
Front End : JSP
Back End : MS SQL 2000/05
Operating System : Windows XP/07
IDE : Net Beans, Eclipse

TECHNIQUE USED
1. Automatic Error Correction
2. Topic-based Query Suggestion
3. Query Expansion

Automatic Error Correction
As query keywords that users have typed in may have typos, traditional
methods will return no answer as they cannot find answers that contain the
query keywords. Obviously this method is not user-friendly. Instead, it is
better to correct the typos, recommend users similar keywords, and return the
answers of the similar keywords. To quantify the similarity between
keywords, existing methods usually adopt edit distance.
The edit distance between two keywords is the minimum number of edit
operations (i.e., insertion, deletion, and substitution) of single characters
needed to transform the first one to the second. For example, the edit distance
of “patent” and “paitant” is 2. Two keywords are said to be similar if their edit
distance is within a given threshold. There are some recent studies on efficient
error correction, which use a filter-and-refine framework to find similar
keywords of a query keyword. The method first uses the filter step to find a
subset of keywords which may be potentially similar to the query keyword.
Then it uses a verification step to remove those false positives and get the
final similar keywords.
Although we can use these methods to efficiently suggest keywords for
complete keywords, they cannot support prefix keyword the user is
completing. To address this problem, we can use the trie structure to do
efficient keyword correction and completion. Using the trie structure, even
users type in a partial keyword, we can also efficiently suggest relevant
accurate keywords. The basic idea is that if a prefix is not similar enough to a
trie node, then we do not need to consider the keywords under the trie node.
We can use this observation to efficiently suggest similar keywords.

Topic based Query Suggestion
We devise a novel model for effectively suggesting keywords as
user’s type in queries letter by letter. The basic idea of our method is
to use the topic model to estimate the probability of the next query
keyword. Intuitively, if a keyword in patents is more topically
coherent with the previously typed query keywords, it would obtain a
higher score. Specifically, we can focus on estimating two important
probabilities: the probability of a keyword conditioned on topics, and
the probability of sampling a keyword from a patent. Both of the two
probabilities are used to estimate the score of each keyword. An LDA
model can be utilized to learn the keyword distribution over each
topic from the underlying patents.
LDA can be classified as a soft-clustering technique which allows a
keyword to appear in multiple topics and takes into account the
degree of a keyword belonging to each topic. The keyword
distribution over a set of patents is learnt by using a language model.
The language model approach can capture the property of the patents
and predict the likelihood of sampling a specific keyword. Thus we
can combine the two probabilities and use the topic-based method to
suggest relevant keywords.

Query Expansion
In many cases, users cannot understand the underlying data precisely.
In this way, they may type in ambiguous keywords or inaccurate
keywords. In addition, the same concept may have different
representations. To this end, we can use Word Net to expand a
keyword. If the query word is indexed by Word Net, we can easily get
the relevant keywords of the query keyword using an inverted list
structure. However Word Net is artificially generated for common
words. If the query keywords are not in Word Net, we cannot
recommend relevant keywords. To address this problem, we have two
solutions. The first one is to utilize search engines, since most search
engines will suggest relevant keywords as user’s type in queries.
We can issue the patent query to search engines and get the relevant
keywords from the search engines, such as Google. The second way is
to mine the relevant keywords from the query logs. To this end, we
use the click through data to mine the correlated queries as follows.
For two queries, if users click the same returned result (patent), they
are potentially relevant. We utilize this property to mine relevant
queries. For two queries, we use the number of times user clicked on
the same patent to denote their relevance. If a keyword pair with their
co-occurrence is larger than a given threshold, the two keywords are
relevant and we use them to do query expansion.

SYSTEM DESIGN
USECASE DIAGRAM
Login
User
Patent search
Ok
Patent Partitions
QueryProcess
Patent DB
Top answer

STATE DIAGRAM
User Login
Enters Keyword
Errorcorrection Topic search
Ok Verified
Expansion
Partitionselection
Queryprocessing
Topanswers

FUTURE ENHANCEMENT
In future, our proposed patent search paradigm will be
implemented by connecting large number of database. This
will increase the efficiency and search ability of patents with
user friendly approach.
Advantage
1. Keyword error correction
2.Partition based patent search
3. High search efficiency
4.Query suggestion and expansion
Application
1. Google patent search
2 .Derwent Innovations Index (DII)
3. USPTO

CONCLUSION
In this paper, we proposed a new patent-search
paradigm. We developed three effective techniques,
error correction, topic-based query suggestion, and
query expansion, to make patent search more user-
friendly and improve user search experience. Error
correlation can provide users accurate keywords and
correct the typing errors.
Topic-based query suggestion can suggest topically
coherent keywords as user’s type in query keywords.
Query expansion can suggest synonyms and those
relevant keywords of query keywords which are in the
same concept with query keywords. We proposed a
partition-based method to improve the search
performance. Experimental results show that our
method achieves high efficiency and quality.

REFERENCES
[1] L. Azzopardi, W. Vanderbauwhede, and H. Joho. Search
system requirements of patent analysts. In SIGIR, pages 775–
776, 2010.
[2] S. Bashir and A. Rauber. Improving retrievability of patents in
prior art search. In ECIR, pages 457–470, 2010.
[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet
allocation. Journal of Machine Learning Research, 3:993–
1022, 2003.
[4] J. Fan, H. Wu, G. Li, and L. Zhou. Suggesting topic-based
query terms as you type. In APWeb, pages 61–67, 2010.
[5] Y. Guo and C. P. Gomes. Ranking structured documents: A
large margin based approach for patent prior art search. In
IJCAI, pages 1058–1064, 2009.
[6] S. Ji, G. Li, C. Li, and J. Feng. Efficient interactive fuzzy
keyword search. In WWW, pages 371–380, 2009.

[7] L. S. Larkey. A patent search and classification system.
In ACM DL, pages 179–187, 1999.
[8] C. Li, J. Lu, and Y. Lu. Efficient merging and filtering
algorithms for approximate string searches. In ICDE,
pages 257–266, 2008.
[9] G. Li, J. Feng, and C. Li. Supporting search-as-you-
type using sql in databases. IEEE TKDE, 2011.
[10] G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy full-
text type-ahead search. VLDB J., 20(4):617–640, 2011.
[11] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease:
an effective 3-in-1 keyword search method for
unstructured, semi-structured and structured data. In
SIGMOD Conference, pages 903–914, 2008.
[12] W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs.
sophisticated approaches for patent prior-art search. In
ECIR, pages 725–728, 2011.

User friendly pattern search paradigm

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (15)

Semelhante a User friendly pattern search paradigm

Semelhante a User friendly pattern search paradigm (20)

Mais de Migrant Systems

Mais de Migrant Systems (18)

Último

Último (20)

User friendly pattern search paradigm

Notas do Editor