Mais conteúdo relacionado Semelhante a A novel method to search information through multi agent search and retrie (20) Mais de IAEME Publication (20) A novel method to search information through multi agent search and retrie1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
512
A NOVEL METHOD TO SEARCH INFORMATION THROUGH
MULTI AGENT SEARCH AND RETRIEVE OPERATION USING
CONTENT AND CONTEXT BASED SEARCH
Poonam Ghuli1
, Swapna Rao P2
, Harsha.S3
and Rajashree Shettar4
1
Asst. Prof., Dept. of CSE, R.V. College of Engineering, Bangalore, Karnataka, India
2
Asst. Prof., Dept. of CSE, Nandi Institution of Technology and Management sciences,
Bangalore, Karnataka, India
3
Asst. Prof., Dept. of CSE, Vidya Vikas Institute of Engineering and Technology,
Mysore, Karnataka, India
4
Professor, Dept. of CSE, R.V. College of Engineering, Bangalore, Karnataka, India.
ABSTRACT
Searching is a tiring job due to enormous increase in online database and growth in
internet usage. Searching for information or files may be in personal computers or in internet.
Searching in any manner consumes time and need an extra effort of filtering the results, as it
provides relevant and irrelevant results. The aim of the paper is to provide the user a novel
method to search files and information in both personal computers and internet. Our system
describes a new searching mechanism which accepts two texts input, processes it according
to the domain chosen, desktop search or internet search and provides relevant result. The
processing of input includes context based search and content based search using indexing
and multi-agents. Content based search is performed on Hadoop map reduce framework to
increase the performance.
Keywords: content search, context search, multi-agents, indexing and Hadoop.
I. INTRODUCTION
The Web search or desktop search has become an integral part of our daily lifestyle.
There are many applications which search information in Internet or in personal computers
which is commonly called search engines or desktop search engines. These search engines
help users in finding the information or files from enormously huge database. But still
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 512-518
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
513
research is being done to improve the efficiency and performance of these search engines. If
only keywords are used to access the information, the retrieved results will have numerous
irrelevant results which are further filtered by users in later stage which consumes time. This
irrelevant data is retrieved as the content of information is not checked against the relevance
of the information being searched. If the reason for which the information is searched is
gathered then it may remove numerous irrelevant results from the search result.
The proposed system first captures both search string and relevance string. Then it performs
context search and content search with the help of multi-agents. Filtering of search results is
done according to the relevance string specified by user. The whole process genetically
evolves each time to give better results by filtering irrelevant search results. There is a
provision to save all the links previously visited by the user in database. So that these results
will have highest priority in next search result. As a result of tracking previously visited
pages, search of relevant data becomes better and better. This process is carried out by multi-
agents, which are set of individual agents put together.
An agent is an autonomous entity which performs a specific task. The characteristics
of agent are autonomy, communicative, adaptive and decentralized. Multiple number of agent
performing same tasks can be created or multiple number of agents performing different tasks
can be created. In this paper three different types of agents performing different tasks are
created and are created in multiple numbers. Context search means search for exact string
provided as filenames or as a sub-strings in web page links. Content search means search of
relevant contents in the file or web pages by opening it.
II. LITERATURE SURVEY
Lot of research work is ongoing in the field of search engines and search result
optimization. These search engines use different methodologies or combination of different
methodologies. A few such methodologies are mentioned below.
Some search engines are specific and confined to search a particular file example:
video search engines. One such video search engine is discussed in [1] which gives refined
video results. It accepts text and image inputs from user to retrieve video results. Video
concept detection, detecting sift features, multi-modality web categorization and semantic
indexing are few mechanism used to retrieve relevant results. Another search engine
discussed in paper [2] is Talash which is a Hindi search engine which has implemented three
models for query expansion. These models are based on lexical variance, user context and
combination of two methods. Search engines may implement different methods such as
context search and indexing. One such example is context based indexing of web document
using ontology [3]. In this work an index of files gathered by crawlers is maintained and
knowledge base ontology repository, ontology context filtering, ontology ranking are the
mechanism used to build a system. In paper [4] content based ranking for search engines is
discussed. Here an approach is mentioned which ranks content of web resource based on user
query. It ranks the relevant pages based on the content and keywords. The user query is used
to retrieve results. Each result is analyzed using keywords and content. Dictionary is built
using identified root words. Each result page is compared with the dictionary using keywords
and content of the page. The matched words are given a weight and total relevancy is
calculated. All pages are ranked in descending order of its total relevancy which falls
between 1 to 0. As discussed earlier along with context based indexing, one can also
associate ranking and an efficient information retrieval system is be developed as mentioned
3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
514
in [5]. There are many systems developed using multi-agents and genetic algorithm, a
specific one is discussed in [6, 8]. In paper [6] an agent driven shopping system is explained
which is called as GAMA (Genetic Algorithm Driven Multi-Agents). The shoppers are
helped by the system through product brokering and negotiation. Genetic algorithm adopted
shopping agents are created to perform product brokering. The process of purchasing
computer hardware is simulated by the system. An evolutionary algorithm, multi-agent
system, interactions of multi-agent system and evolutionary approach is explained in book
[8].In multi-agents system section characteristics of agents, agent classification and agent
architecture are discussed. Cryptosystems performs cryptanalysis using multi-agent based
cryptanalysis techniques for breaking large file encryption. They use context based search to
enhance the probability of breaking. The multi-agents mechanism used in cryptosystems is
used here to implement multi-agents. Multi-agents are used in secured e-shopping using
elliptic curve cryptography [9] and also in protecting from attacking in elliptic curve
cryptography [10].
III. OBJECTIVE AND PROBLEM STATEMENT
The proposed system built is a generic framework to optimize search in standalone
system as well as Internet using
a genetic algorithm for parallel pattern recognition in context search and a content search
using multiple agents to speed up operations in large databases.
Most on-line documents are graphs in multi-dimension, searching and retrieving them is
difficult if not infeasible using 1D or 2D search algorithm. There are 3D search algorithms
proposed, however they are for databases on the same machine and they are content based
only or they are designed for 3D models.
The scope of the proposed system is to provide an optimized search engine which gives fast,
accurate and appropriate results incorporating the features like Genetic Algorithm driven
Multi-Agents, Agent Based Evolutionary Approach, Search Algorithm for 3-Dimension
IV. PROPOSED SYSTEM
The proposed system introduces a new mechanism of searching. It is a generic frame
work which has two models, desktop search and Internet search. Both models use the same
methodology with slight differences. The proposed system first performs context search and
then does content search in documents or web pages being searched. The context search
means searching for exact string which is provided as a filename or sub-strings in web pages.
The content search means searching for relevant string by opening files or web pages. The
content search process is performed using multi-agents. This mechanism remains same for
both modules i.e. desktop search and internet search. But desktop search uses index file to
perform context search and internet search relies on Google to perform context search.
Desktop search maintains an index file. The index file is a text file, it contains the file paths
of files present in the system. This index file is used to perform context search. The whole
process genetically evolves itself by keeping track of previously user selected search results
for the same strings.
The system architecture shown in figure 4.1 consists of system configuration on
which the proposed model is deployed. The bottom most layer consists of operating system.
The operating system preferred here is Ubuntu 12.04 which is fast, secure and compatible
4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
515
with a range of devices. The next layer introduces Hadoop framework, to take advantage of
parallel computing using clusters commodity hardware and Hadoop distributed file system
(HDFS). NetBeans IDE is used as a tool to develop the system using any programming
language such as Java, Python etc. MYSQL is the database used to store the user interactions
with search engine.
Figure 4.1 System Architecture for multi-agent search based on context and content search
Figure 4.2 Flow of mechanism of context and content search system using multi-agents
5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
516
The figure 4.2 shows the flow of mechanism of entire system composed of different
sub processes to get relevant and optimal search results. The user provides two strings, search
string and relevance string. Next the user need to choose the domain in which the relevant
string is to be searched, it is either standalone system search or Internet search. If the user
choice is Desktop search, then Desktop search mechanism follows. In this process a search
string is compared with the index file. The index is a text file having file paths of all files
present in the standalone system. The matched file paths are provided as search results. These
results and relevance string are the inputs to the Hadoop map job. The map job outputs
relevance string word frequency. In the reduce job if the relevance string word frequency is
more, then the respective filename with its path is selected as the search result.
If the user choice is Internet search, then Internet search mechanism follows. In this
process search string is provided to Google search engine and the results are retrieved. To
these results content search is performed, before this process respective agents are created
and task is assigned. Agents are created using hyper threading concept.Threads are created
for each page link present in the result page. These threads are named as agent3 threads. Each
Agent3 threads task is to create 10 more threads for each link present in result page, which is
named as agent2. Agent2 threads task is to create another thread which does the task of
opening the link, searching for string2 in the link opened and to return the link, if the link is
relevant. The major part of content search process is performed on Hadoop. The links and
relevance string are the input to map job, which provides relevance string word frequency as
output. The links, relevance string and word frequency of relevance string are inputs to
reduce job. If the relevance string word frequency is more, then the respective link is
outputted. If the search string and relevance string pair has an entry in the database then the
search result retrieved is merged with the present search result. All the user hit links are
stored in the database for making the result list better in forthcoming transactions. The figure
4.3 depicts how internally different sub processes communicate with each other.
Figure 4.3 The overall structure of communication between sub-processes
Descriptions of various components are explained in this section. In Context search
the search string is used to search files in desktop search or links in Internet search. In
Content search the relevance string is used to search files in desktop search or links in
Internet search. In Preprocessing process, updating of index file whenever a new drive is
6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
517
mounted is considered as preprocessing in desktop search model or searching, retrieving and
updating the database is considered as preprocessing in Internet search model. Agent creation
process creates a specified number of agents in different levels of the content search process.
Agents are created using hyper threading concept. These threads performs specific task
according to the specification. Three dimensional content search processes is performed by
multiple agents. The dimensions here considered are pages, links and contents. Set of agents
are created for these dimensions and those agents are responsible of these dimensional search.
Depending upon the relevance of the files or links according to the relevance string provided,
filtering process is performed on the files or links. Filtering removes irrelevant file paths or
web links and chooses only files and links which contains content related to relevance string.
Search Results are always stored and retrieved from the database. Instead of storing all search
results, only the user hit links are stored and updated on each search process which makes the
entire process genetic. As it refines results each time on each search process there is a
learning process.
V. CONCLUSION AND FUTURE ENHANCEMENTS
In this paper two models are discussed. The first model is desktop search, which
search for files in the standalone system by maintaining the index file. The second model is
Internet search, which search Internet documents or links from the Internet with the help of
GoogleTM
. This model optimizes the Google results based on the relevance of the search
conducted. In proposed system, 100 different set of search string and relevance string pair
were provided to Internet module. All links part of search results were relevant to the
relevance string provided by user. In that about 95% of the search result is composed of those
links that are needed by the user. And in the search result these links are within top 5 links.
When 50 different set of search string and relevance string pair were provided to desktop
module. It gave 100% exact result needed. For example even if 10 files with same name are
retrieved in context search process only files whose content matches with relevance string are
part of output.
The proposed system can be extended to support the following functionality. The
system optimizes results from Google search engine alone but it can be extended for many
search engines. Desktop search module is proposed only for standalone system; the
enhancements can be performed on other systems connected in LAN. Filtering of strings
provided to search engine can be done based on meaning of the words, phrases used and on
frequency of words used. Ontological meaning of strings provided to search engines can be
analyzed and filtered. This analysis helps in storing and retrieval of results. The system can
be enhanced for its implementation in social networking applications.
REFERENCES
1. S. Gomathy, K.P.Deepa, T. Revathi & L. Maria Michael Visuwasam. Genre Specific
Classification for Information Search and Multimodal Semantic Indexing for Data
Retrieval. The Standard International Journals Transactions on Computer Science
Engineering & its Applications (CSEA), Vol. 1, No. 1, March-April 2013
2. Nandkishor Vasnik1, Shriya Sahu2, Devshri Roy. Talash: A semantic and context
based optimized hindi search engine. International Journal of Computer Science,
Engineering and Information Technology (IJCSEIT), Vol.2, No.3, June 2012.
7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
518
3. Priyanka Saxena, Nidhi Tyagi & Dr. M.P.Yadav, Manik Chandra Pandey. Context
Based Indexing of Web Document using Ontology. International Conference on
Recent Trends in Engineering & Technology (ICRTET2012) ISBN: 978-81-925922-0-
6.
4. P.Sudhakar, G.Poonkuzhali, R.Kishore Kumar, Member IAENG. Content Based
Ranking for Search Engines. International Multi-conference of Engineers and
computer scientists. 2012 Vol I, IMECS march 12-14 2012, Hong-Kong. ISBN-978-
988-19251-1-4.
5. Sunita Rani, Vinod Jain & Geetanjali Gandhi. Context Based Indexing and Ranking in
Information Retrieval Systems. International Journal of Computer Science and
Management Research, Vol 2 Issue 4 April 2013 ISSN 2278-733X.
6. Dr.Magda B. Fayek, Dr. Ihab A. Talkhan and Khalil S. El-Masry. GAMA (Genetic
Algorithm driven Multi-Agents) for E-Commerce Integrative Negotiation.
GECCO’09, July 8–12, 2009, Montréal Québec, Canada.
7. Wooldridge M.: An Introduction to Multi-Agent Systems: New-York, Jonh Wiley &
Sons (2002).
8. Ruhul Amin Sarkar and Tapabrata Ray, Agent Based Evolutionary search. Adaptation,
learning and optimization Volume 5. Springer-Verlag Berlin Heidelberg 2010. ISBN:
978-3- 642 -13424-8
9. Sougata Khatua, Arijit Das, Zhang Yuheng, LI Li and N.Ch.S.N. Iyengar, Agent Based
secured e-shopping Using Elliptic Curve Cryptography, International Journal of
Advanced Science and Technology Vol. 38 January, 2012
10. Xu Huang, Pritam Gajkumar Shah, and Dharmendra Sharma, Multi-Agent System
Protecting from Attacking in Elliptic Curve Cryptography, G.Phillips-Wren et al.
[Eds], Advances in Intel. Decision Technologies, SIST 4, pp.123-131.Springer-Verlag
Berlin, Heidelberg 2010.
11. Shruti V Kamath, Mayank Darbari and Dr. Rajashree Shettar, “Content Based
Indexing and Retrival From Vehicle Surveillance Videos using Gaussian Mixture
Model”, International Journal of Computer Engineering & Technology (IJCET),
Volume 4, Issue 1, 2013, pp. 420-429, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
12. Shraddha Chaurasia and Lalit Dole, “Secure Masid: Secure Multi-Agent System for
Intrusion Detection”, International Journal of Computer Engineering & Technology
(IJCET), Volume 4, Issue 1, 2013, pp. 392-397, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.