SlideShare a Scribd company logo
1 of 5
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93
www.iosrjournals.org
www.iosrjournals.org 89 | Page
Context Based Web Indexing For Semantic Web
Anchal Jain1
Nidhi Tyagi 2
Lecturer(JPIEAS) Asst. Professor(SHOBHIT UNIVERSITY)
Abstract : A context based focused crawler downloads web pages that are more relevant for user query in
syntax of context. Wherein downloaded web pages are indexed for providing the speed to search engine. This
paper purposes a new indexing technique based on B+ tree that indexed the context along with ontology’s of
keywords. These keywords are extracted from the web documents that are stored in web repository. This
purposed indexing technique increases the speed of search engine for finding the more relevant documents from
semantic web
Keywords - Architecture, B+ Tree, Context, Semantic web, Web repository
I. INTRODUCTION
With the rapid growth of the Internet, the World Wide Web (WWW) has become one of the most
important resources for obtaining information and one of the most important media of communication.
Currently there are huge amounts of documents existing in the World Wide Web. Finding information from
WWW according the user interest becomes a critical task. Modern web search engines can cache, index and
search several billion of web pages, which only includes a small part of all existing documents in the Web. And
even for this small amount, the search quality could not meet a user's requirements in many cases. Many ideas
have been proposed to improve the web search quality, which can be measured with the following two metrics:
(1) Precision rate: The ratio of the number of relevant documents retrieved to the total number of documents
retrieved.
(2) Recall rate: The ratio of the number of relevant documents extracted to the total number of relevant
documents in the Web.
The purpose of storing an index is to optimize speed and performance in finding relevant documents
for a search query. Without an index, the search engine would scan every document in the corpus, which would
require considerable time and computing power. For example, while an index of 10,000 documents can be
queried within milliseconds, a sequential scan of every word in documents is a time consuming task. The
additional computer storage required to store the index, as well as the considerable increase in the time required
for an update to take place, are traded off for the time saved during information retrieval[1].
In B+ tree all paths from the root to the leaf nodes are equal length .So this tree is called balanced tree. All data
is stored at the leaf nodes (leaf pages). Leaf pages are linked to each other .B+ tree reduces the number of I/O
operations required to find an element in the tree. Finding a record requires O (Logbn) operations. This strategy
is more beneficial for search engine.
II. Related Work
Here many algorithm & technique all ready purposed for indexer to achieve the indexing on documents for
information retrieval. But they are not more efficient for search.
Nidhi Tyagi, R.P Agarwal [1] This paper proposes a technique for indexing [1] the keyword extracted from
the web documents along with their contexts wherein it uses a height balanced binary search (AVL) tree, for
indexing purpose to enhance the performance of the retrieval system.
P. Gupta and A. K. Sharma [2], worked on context based indexing in search engines using ontology. The
index construction is done on the basis of the context using ontology. The context repository, thesaurus and
ontology repository are used by the indexer to identify the context of the document.
C. Zhou, W. Ding and Na Yang [5], the paper introduces a double indexing mechanism for search engines
based on campus Net. The CNSE consists of crawl machine, Chinese automatic segmentation, index and search
machine. The proposed mechanism has document index as well as word index. The document index is based on,
where the documents do the clustering, and ordered by the position in each document. During the retrieval, the
search engine first gets the document id of the word in the word index, and then goes to the position of
Submitted Date 14 June 2013 Accepted Date: 19 June 2013
Context Based Web Indexing For Semantic Web
www.iosrjournals.org 90 | Page
corresponding word in the document index. Because in the document index, the word in the same document is
adjacent, the search engine directly compares the largest word matching assembly with the sentence that users
submit. The mechanism proposed, seems to be time consuming as the index exists at two levels.
The critical look at the available literature reveals that there is a requirement for a technique to organize the
keyword and their contexts in a better fashion as storing in a linear fashion makes searching of a document a bit
time consuming.
III. Purposed work
This paper proposes an algorithm for indexing the keyword extracted from the web documents along
with their context & ontology. The purposed indexing technique is a B+ tree, in addition to improved
performance in the retrieval of information; this data structure is able to support dynamic indexing, which is
especially important for environments where documents are changed frequently. If the planning about the
arrangement of the keywords is done then B+ tree can be achieved. B+ tree algorithm & technique improve the
efficiency of indexer for searching the documents from semantic web. This paper purpose a ontology based
context indexing architecture in fig 1
3.1 Description of Various Components
1. Repository of web page: This is the database which contains the set of documents that have been collected
by the crawler.
Extract Keywords
USER
Fig 1 Architecture of Context Based Indexing.
keyword Context Ontology Doc_id
www
Crawl Manager
Pre-Processing
of Documents
B+ Tree
Thesaurus
Query
Processor
or
Repository of web pages
Context Repository
reRepository
Ontology Repository
Ontology Based Documents
Query
Interface
Word net
net
Context Based Web Indexing For Semantic Web
www.iosrjournals.org 91 | Page
2. Preprocessing of document: The preprocessing step involves stemming as well as removal of stop words. A
stop word is any word which has no semantic content. Common stop words are prepositions and articles, as well
as high frequency words that do not help retrieval
3. Thesaurus: It is a dictionary of words available on the World Wide Web from thesaurus.com which contains
the words as well as their multiple meanings.
4. Context Repository: This is a database which contains the various contexts. Also the new contexts derived
from thesaurus are stored in this repository. The context repository maintains a database of several types of
context data
5. Ontology Repository: This is a database of ontology’s which contains the various relationships among
objects in various domains. Ontology repository contains various concepts with their relationships.
6. Ontology based document: This context represents the theme of the document that has been extracted using
context repository, thesaurus and ontology repository.
7. B+ Tree: this is the indexing technique that is constructed after extracting the context of the document on the
basis of ontology.
8. Query Interface & query processor: It is that module of the search engine that receives user queries and
hence after searching the results through query processor in the index provides relevant information to the user.
Fig 2 Query Retrieval Interface
In figure 2 the user entered keyword Crown & desired context of the keyword displayed through the
generate context button, the corresponding related web page URLs are listed (available in the repository)
displayed by pressing the show document button. This can help the user to directly access more related and
relevant information.
Text Thesaurus Ontology
Enter keyword CROWN
Figure 1
Show
Documents
Generate context
Context Based Web Indexing For Semantic Web
www.iosrjournals.org 92 | Page
3.2 Comparison of Performance of Proposed and Existing Indexing Algorithm
0
1
2
3 2.5 2.5
1.5
ALGORITHM COMPARISION
BINARY TREE
AVL TREE
B+ TREE
Binary Tree AVL Tree B+ Tree
2.5 operations 2.5 operations 1.5 operations
The purposed algorithm for indexing provides a fast access to document context and structure along
with an optimized searching.
3.3 Proposed algorithm for the indexing scheme
Step1: Preprocess the crawled web documents and extract the keyword along with their frequency of
occurrence.
Step 2: Input the keywords to the context generator which extracts the multiple contextual Sense of the word.
Context is being searched in the thesaurus (a dictionary of words available on WWW from thesaurus.com,
which contains the words as well their multiple meanings).
Step3: The keywords along with the context are indexed using the B+ tree.
Step4: Compare the entered keyword with the node’s keyword field of tree, until a similar word is found.
Corresponding document_id is stored? Context is being searched in the thesaurus (a dictionary of words
available on from thesaurus.com, which contains the Words as well their multiple meanings).
Step5: If search is not a success, create a node containing the following fields (Left child, Keyword, right child,
and link) .The link is pointer variable which points to the Database where the context of keyword stored along
with its ontology based document_id.
Step6: Arrange the node in the B+ tree, according to the height BF.
Step7: Repeat step 4, 5 and 6 until all the extract keywords are arranged.
Step8: Now when the user fires the query with context explicitly specified, then the index is being searched,
reducing its search time to half of the linear search.
Step9: Thus, B+ indexing technique provides a fast access to document context and structure.
IV. Conclusion
This paper presents an indexing structure that can be constructed on the basis of the context of the
document. The context of the document can be extracted by using thesaurus and ontology repository. So this
paper uses ontology for context based index building. The context based index enables retrieval from index on
the basis of context rather than keywords. This aids in improving the quality of the retrieved results. A rough
estimate of support values for the existing and the proposed system clearly depicts the better performance of the
existing system.
Future Scopes: Future scope of this system is that the B+ tree based indexing technique, is able to support
dynamic indexing and improves the performance in terms of accuracy and efficiency for retrieving more,
relevant documents as per the user’s requirements since the context of the various keywords is also stored along
with them. Thus, the indexing technique provides a fast access to document context and structure along with an
optimized searching
Context Based Web Indexing For Semantic Web
www.iosrjournals.org 93 | Page
References
[1]. Nidhi tyagi, Rahul Rishi ,R.P. Agarwal “Context based Web Indexing for Storage of Relevant Web Pages” International
Journal of Computer Applications (0975 – 8887) Volume 40– No.3, February 2012
[2]. Parul Gupta and A.K.Sharma “Context based Indexing in Search Engines using Ontology”, International Journal of Computer
Applications, Volume 1 No. 14, pp 49-52, 2010.
[3]. Pooja Gupta , Dr. A K Sharma, J. P.Gupta, Komal Bhatia “ A Novel Framework for Context Based Distributed Focus Crawler
(CBDFC)” Int. J. Computer and Communication Technology, Vol. 1, No. 1, 2009
[4]. Naresh Chauhan and A. K. Sharma,” Design of an Agent Based Context Driven Focused Crawler”, BVICAM’S International
Journal of Information Technology, pp 61-66, 2008.
[5]. Changshang Zhou, Wei Ding and Na Yang, “Double Indexing Mechanism of Search Engine based on Campus Net”,
Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06), 2006.
[6]. O. Zamir, O. Etzioni, O. Madanim, and R.M. Karp “Fast and Intuitive Clustering of Web Documents,” Proceeding Third
International Conference Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997.
[7]. S. Chakrabarti, K. Punera, Mallena Subramanyam, “Accelerated Focused Crawling through Online relevance feedback”, paper
presented at WWW conference December 2002.
[8]. Steve Lawrence, “Context in Web Search”, IEEE Data Engineering Bulletin, 2000.
[9]. S. Chakrabarti, M. van den Berg, and B. Dom. “Focused crawling: a new approach to topic-specific web resource discovery”.
In WWW-8, 1999.
[10]. Word Net-Online dictionary and hierarchical thesaurus Obtained through the Internet http://www.wordnetonline.com [accessed
28/12/2009].
[11]. Sajendra Kumar, Ram Kumar Rana ,Pawan Singh “ Ontology based Semantic Indexing Approach for Information Retrieval
System” International Journal of Computer Applications (0975 – 8887) Volume 49– No.12, July 2012

More Related Content

What's hot

Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536IJRAT
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1 e_chae
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...rahulmonikasharma
 
Effective Feature Selection for Mining Text Data with Side-Information
Effective Feature Selection for Mining Text Data with Side-InformationEffective Feature Selection for Mining Text Data with Side-Information
Effective Feature Selection for Mining Text Data with Side-InformationIJTET Journal
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesKausar Mukadam
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
Information Storage and Retrieval system (ISRS)
Information Storage and Retrieval system (ISRS)Information Storage and Retrieval system (ISRS)
Information Storage and Retrieval system (ISRS)Sumit Kumar Gupta
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Techniquepaperpublications3
 
Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...IRJET Journal
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET Journal
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 

What's hot (19)

Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
 
Effective Feature Selection for Mining Text Data with Side-Information
Effective Feature Selection for Mining Text Data with Side-InformationEffective Feature Selection for Mining Text Data with Side-Information
Effective Feature Selection for Mining Text Data with Side-Information
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web Databases
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
At33264269
At33264269At33264269
At33264269
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
Information Storage and Retrieval system (ISRS)
Information Storage and Retrieval system (ISRS)Information Storage and Retrieval system (ISRS)
Information Storage and Retrieval system (ISRS)
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Technique
 
Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Anatomy of google
Anatomy of googleAnatomy of google
Anatomy of google
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 

Viewers also liked

Developers' New features of Sql server express 2012
Developers' New features of Sql server express 2012Developers' New features of Sql server express 2012
Developers' New features of Sql server express 2012Ziaur Rahman
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningIOSR Journals
 
Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Natalya Dyrda
 
царскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинацарскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинаNatalya Dyrda
 
Rp presentation
Rp presentationRp presentation
Rp presentationKhalid Ak
 
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...IOSR Journals
 
Implementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain OntologyImplementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain OntologyIOSR Journals
 
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...Image Processing for Automated Flaw Detection and CMYK model for Color Image ...
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...IOSR Journals
 
Filtering Schemes for Injected False Data in Wsn
Filtering Schemes for Injected False Data in WsnFiltering Schemes for Injected False Data in Wsn
Filtering Schemes for Injected False Data in WsnIOSR Journals
 
Performance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in WekaPerformance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in WekaIOSR Journals
 
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...IOSR Journals
 
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...IOSR Journals
 
Improving Web Image Search Re-ranking
Improving Web Image Search Re-rankingImproving Web Image Search Re-ranking
Improving Web Image Search Re-rankingIOSR Journals
 
A middleware approach for high level overlay network
A middleware approach for high level overlay networkA middleware approach for high level overlay network
A middleware approach for high level overlay networkIOSR Journals
 
130729【let53全国研究大会】003.key
130729【let53全国研究大会】003.key130729【let53全国研究大会】003.key
130729【let53全国研究大会】003.keySei Sumi
 
Compromising windows 8 with metasploit’s exploit
Compromising windows 8 with metasploit’s exploitCompromising windows 8 with metasploit’s exploit
Compromising windows 8 with metasploit’s exploitIOSR Journals
 
International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)Odyssey Recruitment
 
Security in MANET based on PKI using fuzzy function
Security in MANET based on PKI using fuzzy functionSecurity in MANET based on PKI using fuzzy function
Security in MANET based on PKI using fuzzy functionIOSR Journals
 

Viewers also liked (20)

Developers' New features of Sql server express 2012
Developers' New features of Sql server express 2012Developers' New features of Sql server express 2012
Developers' New features of Sql server express 2012
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"Поэма А.С. Пушкина "Руслан и Людмила"
Поэма А.С. Пушкина "Руслан и Людмила"
 
царскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкинацарскосельский лицей а.с.пушкина
царскосельский лицей а.с.пушкина
 
Rp presentation
Rp presentationRp presentation
Rp presentation
 
Manual maestro construcor
Manual maestro construcorManual maestro construcor
Manual maestro construcor
 
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...
Energy Efficient and Secure, Trusted network discovery for Wireless Sensor Ne...
 
Implementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain OntologyImplementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain Ontology
 
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...Image Processing for Automated Flaw Detection and CMYK model for Color Image ...
Image Processing for Automated Flaw Detection and CMYK model for Color Image ...
 
Filtering Schemes for Injected False Data in Wsn
Filtering Schemes for Injected False Data in WsnFiltering Schemes for Injected False Data in Wsn
Filtering Schemes for Injected False Data in Wsn
 
Performance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in WekaPerformance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in Weka
 
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...
Efficiency and Power Factor improvement of Bridgeless Soft-switched PWM Cuk C...
 
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
 
Improving Web Image Search Re-ranking
Improving Web Image Search Re-rankingImproving Web Image Search Re-ranking
Improving Web Image Search Re-ranking
 
A middleware approach for high level overlay network
A middleware approach for high level overlay networkA middleware approach for high level overlay network
A middleware approach for high level overlay network
 
130729【let53全国研究大会】003.key
130729【let53全国研究大会】003.key130729【let53全国研究大会】003.key
130729【let53全国研究大会】003.key
 
Compromising windows 8 with metasploit’s exploit
Compromising windows 8 with metasploit’s exploitCompromising windows 8 with metasploit’s exploit
Compromising windows 8 with metasploit’s exploit
 
International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)International Medical Careers Forum 2016 (complete)
International Medical Careers Forum 2016 (complete)
 
Security in MANET based on PKI using fuzzy function
Security in MANET based on PKI using fuzzy functionSecurity in MANET based on PKI using fuzzy function
Security in MANET based on PKI using fuzzy function
 
Media genre
Media genreMedia genre
Media genre
 

Similar to Context Based Web Indexing For Semantic Web

Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic InformationAmy Roman
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataMelinda Watson
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...IJORCS
 
Indexing in Search Engine
Indexing in Search EngineIndexing in Search Engine
Indexing in Search EngineShikha Gupta
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a librarySEECS NUST
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 

Similar to Context Based Web Indexing For Semantic Web (20)

N017249497
N017249497N017249497
N017249497
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured Data
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
 
Indexing in Search Engine
Indexing in Search EngineIndexing in Search Engine
Indexing in Search Engine
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
G1803054653
G1803054653G1803054653
G1803054653
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a library
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Context Based Web Indexing For Semantic Web

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93 www.iosrjournals.org www.iosrjournals.org 89 | Page Context Based Web Indexing For Semantic Web Anchal Jain1 Nidhi Tyagi 2 Lecturer(JPIEAS) Asst. Professor(SHOBHIT UNIVERSITY) Abstract : A context based focused crawler downloads web pages that are more relevant for user query in syntax of context. Wherein downloaded web pages are indexed for providing the speed to search engine. This paper purposes a new indexing technique based on B+ tree that indexed the context along with ontology’s of keywords. These keywords are extracted from the web documents that are stored in web repository. This purposed indexing technique increases the speed of search engine for finding the more relevant documents from semantic web Keywords - Architecture, B+ Tree, Context, Semantic web, Web repository I. INTRODUCTION With the rapid growth of the Internet, the World Wide Web (WWW) has become one of the most important resources for obtaining information and one of the most important media of communication. Currently there are huge amounts of documents existing in the World Wide Web. Finding information from WWW according the user interest becomes a critical task. Modern web search engines can cache, index and search several billion of web pages, which only includes a small part of all existing documents in the Web. And even for this small amount, the search quality could not meet a user's requirements in many cases. Many ideas have been proposed to improve the web search quality, which can be measured with the following two metrics: (1) Precision rate: The ratio of the number of relevant documents retrieved to the total number of documents retrieved. (2) Recall rate: The ratio of the number of relevant documents extracted to the total number of relevant documents in the Web. The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in documents is a time consuming task. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval[1]. In B+ tree all paths from the root to the leaf nodes are equal length .So this tree is called balanced tree. All data is stored at the leaf nodes (leaf pages). Leaf pages are linked to each other .B+ tree reduces the number of I/O operations required to find an element in the tree. Finding a record requires O (Logbn) operations. This strategy is more beneficial for search engine. II. Related Work Here many algorithm & technique all ready purposed for indexer to achieve the indexing on documents for information retrieval. But they are not more efficient for search. Nidhi Tyagi, R.P Agarwal [1] This paper proposes a technique for indexing [1] the keyword extracted from the web documents along with their contexts wherein it uses a height balanced binary search (AVL) tree, for indexing purpose to enhance the performance of the retrieval system. P. Gupta and A. K. Sharma [2], worked on context based indexing in search engines using ontology. The index construction is done on the basis of the context using ontology. The context repository, thesaurus and ontology repository are used by the indexer to identify the context of the document. C. Zhou, W. Ding and Na Yang [5], the paper introduces a double indexing mechanism for search engines based on campus Net. The CNSE consists of crawl machine, Chinese automatic segmentation, index and search machine. The proposed mechanism has document index as well as word index. The document index is based on, where the documents do the clustering, and ordered by the position in each document. During the retrieval, the search engine first gets the document id of the word in the word index, and then goes to the position of Submitted Date 14 June 2013 Accepted Date: 19 June 2013
  • 2. Context Based Web Indexing For Semantic Web www.iosrjournals.org 90 | Page corresponding word in the document index. Because in the document index, the word in the same document is adjacent, the search engine directly compares the largest word matching assembly with the sentence that users submit. The mechanism proposed, seems to be time consuming as the index exists at two levels. The critical look at the available literature reveals that there is a requirement for a technique to organize the keyword and their contexts in a better fashion as storing in a linear fashion makes searching of a document a bit time consuming. III. Purposed work This paper proposes an algorithm for indexing the keyword extracted from the web documents along with their context & ontology. The purposed indexing technique is a B+ tree, in addition to improved performance in the retrieval of information; this data structure is able to support dynamic indexing, which is especially important for environments where documents are changed frequently. If the planning about the arrangement of the keywords is done then B+ tree can be achieved. B+ tree algorithm & technique improve the efficiency of indexer for searching the documents from semantic web. This paper purpose a ontology based context indexing architecture in fig 1 3.1 Description of Various Components 1. Repository of web page: This is the database which contains the set of documents that have been collected by the crawler. Extract Keywords USER Fig 1 Architecture of Context Based Indexing. keyword Context Ontology Doc_id www Crawl Manager Pre-Processing of Documents B+ Tree Thesaurus Query Processor or Repository of web pages Context Repository reRepository Ontology Repository Ontology Based Documents Query Interface Word net net
  • 3. Context Based Web Indexing For Semantic Web www.iosrjournals.org 91 | Page 2. Preprocessing of document: The preprocessing step involves stemming as well as removal of stop words. A stop word is any word which has no semantic content. Common stop words are prepositions and articles, as well as high frequency words that do not help retrieval 3. Thesaurus: It is a dictionary of words available on the World Wide Web from thesaurus.com which contains the words as well as their multiple meanings. 4. Context Repository: This is a database which contains the various contexts. Also the new contexts derived from thesaurus are stored in this repository. The context repository maintains a database of several types of context data 5. Ontology Repository: This is a database of ontology’s which contains the various relationships among objects in various domains. Ontology repository contains various concepts with their relationships. 6. Ontology based document: This context represents the theme of the document that has been extracted using context repository, thesaurus and ontology repository. 7. B+ Tree: this is the indexing technique that is constructed after extracting the context of the document on the basis of ontology. 8. Query Interface & query processor: It is that module of the search engine that receives user queries and hence after searching the results through query processor in the index provides relevant information to the user. Fig 2 Query Retrieval Interface In figure 2 the user entered keyword Crown & desired context of the keyword displayed through the generate context button, the corresponding related web page URLs are listed (available in the repository) displayed by pressing the show document button. This can help the user to directly access more related and relevant information. Text Thesaurus Ontology Enter keyword CROWN Figure 1 Show Documents Generate context
  • 4. Context Based Web Indexing For Semantic Web www.iosrjournals.org 92 | Page 3.2 Comparison of Performance of Proposed and Existing Indexing Algorithm 0 1 2 3 2.5 2.5 1.5 ALGORITHM COMPARISION BINARY TREE AVL TREE B+ TREE Binary Tree AVL Tree B+ Tree 2.5 operations 2.5 operations 1.5 operations The purposed algorithm for indexing provides a fast access to document context and structure along with an optimized searching. 3.3 Proposed algorithm for the indexing scheme Step1: Preprocess the crawled web documents and extract the keyword along with their frequency of occurrence. Step 2: Input the keywords to the context generator which extracts the multiple contextual Sense of the word. Context is being searched in the thesaurus (a dictionary of words available on WWW from thesaurus.com, which contains the words as well their multiple meanings). Step3: The keywords along with the context are indexed using the B+ tree. Step4: Compare the entered keyword with the node’s keyword field of tree, until a similar word is found. Corresponding document_id is stored? Context is being searched in the thesaurus (a dictionary of words available on from thesaurus.com, which contains the Words as well their multiple meanings). Step5: If search is not a success, create a node containing the following fields (Left child, Keyword, right child, and link) .The link is pointer variable which points to the Database where the context of keyword stored along with its ontology based document_id. Step6: Arrange the node in the B+ tree, according to the height BF. Step7: Repeat step 4, 5 and 6 until all the extract keywords are arranged. Step8: Now when the user fires the query with context explicitly specified, then the index is being searched, reducing its search time to half of the linear search. Step9: Thus, B+ indexing technique provides a fast access to document context and structure. IV. Conclusion This paper presents an indexing structure that can be constructed on the basis of the context of the document. The context of the document can be extracted by using thesaurus and ontology repository. So this paper uses ontology for context based index building. The context based index enables retrieval from index on the basis of context rather than keywords. This aids in improving the quality of the retrieved results. A rough estimate of support values for the existing and the proposed system clearly depicts the better performance of the existing system. Future Scopes: Future scope of this system is that the B+ tree based indexing technique, is able to support dynamic indexing and improves the performance in terms of accuracy and efficiency for retrieving more, relevant documents as per the user’s requirements since the context of the various keywords is also stored along with them. Thus, the indexing technique provides a fast access to document context and structure along with an optimized searching
  • 5. Context Based Web Indexing For Semantic Web www.iosrjournals.org 93 | Page References [1]. Nidhi tyagi, Rahul Rishi ,R.P. Agarwal “Context based Web Indexing for Storage of Relevant Web Pages” International Journal of Computer Applications (0975 – 8887) Volume 40– No.3, February 2012 [2]. Parul Gupta and A.K.Sharma “Context based Indexing in Search Engines using Ontology”, International Journal of Computer Applications, Volume 1 No. 14, pp 49-52, 2010. [3]. Pooja Gupta , Dr. A K Sharma, J. P.Gupta, Komal Bhatia “ A Novel Framework for Context Based Distributed Focus Crawler (CBDFC)” Int. J. Computer and Communication Technology, Vol. 1, No. 1, 2009 [4]. Naresh Chauhan and A. K. Sharma,” Design of an Agent Based Context Driven Focused Crawler”, BVICAM’S International Journal of Information Technology, pp 61-66, 2008. [5]. Changshang Zhou, Wei Ding and Na Yang, “Double Indexing Mechanism of Search Engine based on Campus Net”, Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06), 2006. [6]. O. Zamir, O. Etzioni, O. Madanim, and R.M. Karp “Fast and Intuitive Clustering of Web Documents,” Proceeding Third International Conference Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997. [7]. S. Chakrabarti, K. Punera, Mallena Subramanyam, “Accelerated Focused Crawling through Online relevance feedback”, paper presented at WWW conference December 2002. [8]. Steve Lawrence, “Context in Web Search”, IEEE Data Engineering Bulletin, 2000. [9]. S. Chakrabarti, M. van den Berg, and B. Dom. “Focused crawling: a new approach to topic-specific web resource discovery”. In WWW-8, 1999. [10]. Word Net-Online dictionary and hierarchical thesaurus Obtained through the Internet http://www.wordnetonline.com [accessed 28/12/2009]. [11]. Sajendra Kumar, Ram Kumar Rana ,Pawan Singh “ Ontology based Semantic Indexing Approach for Information Retrieval System” International Journal of Computer Applications (0975 – 8887) Volume 49– No.12, July 2012