SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
DOI: 10.4018/IJITWE.2018040102
International Journal of Information Technology and Web Engineering
Volume 13 • Issue 2 • April-June 2018
Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
11
A MapReduce-Based User Identification
Algorithm in Web Usage Mining
Mitali Srivastava, Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India
Rakhi Garg, Computer Science Section, Mahila Maha Vidyalaya, Banaras Hindu University, Varanasi, India
P.K. Mishra, Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India
ABSTRACT
This article contends that in the booming era of information, analysing users’ navigation behaviour
is an important task. User identification is considered as one of the important and challenging tasks
in the data preprocessing phase of the Web usage mining process. There are three important issues
with the reactive strategies of User identification methods that need to be focused: the first is dealing
of sharing IP address problem in a proxy server environment, the second is distinguishing users
from Web robots, and the third is dealing with huge datasets efficiently. In this article, authors have
developed a MapReduce-based User identification algorithm that deals with the above mentioned
three issues related to user identification methods. Moreover, the experiment on the real web server
log shows the effectiveness and efficiency of the developed algorithm.
KEyWoRdS
Data Cleaning, Data Preprocessing, Hadoop, MapReduce, User Identification,Web Server Log,Web Usage Mining
1. INTRodUCTIoN
Apart from the content and structural information of the Website, server logs have also been
considered as one of the valuable sources of information. This information can be used to analyse
users’ navigation behaviour (Pabarskaite & Raudys, 2007). Web usage mining is a class of Web
mining to mine server logs to find relevant patterns. These patterns are successfully applied
in various applications like restructuring Websites, recommendation of pages and products,
personalizing Web contents, and improving server activities like prefetching and caching (Facca
& Lanzi, 2005; Kemmar, Lebbah, & Loudni, 2016). Web usage mining process can be divided into
three important steps: Data preprocessing, Pattern extraction and Pattern evaluation (Liu, 2007). Due
to the unstructured and huge nature of log data, Data preprocessing step has become the essential
and time-consuming task in the Web usage mining process. It is a complex task and consumes more
than 60% of whole Web usage mining process time (Tanasa & Trousse, 2004). Data preprocessing
of server log incorporates several steps: Data fusion, Data cleaning, User identification, Session
identification, Path completion, and Data transformation (Cooley, Mobasher, & Srivastava, 1999;
Liu, 2007). Among them, User identification is one of the challenging tasks in Data Preprocessing
International Journal of Information Technology and Web Engineering
Volume 13 • Issue 2 • April-June 2018
12
due to the external/local proxy server, shared internet and cache systems (Pabarskaite & Raudys,
2007). This article focuses on User identification, a complex and challenging phase in the Web
usage mining process. In User identification phase, users are identified and their activities are
grouped and recorded into a user activity file. Several heuristics have been proposed for better
identification of the user in last few years. Spiliopoulou et al. have classified user identification
methods into two classes namely proactive methods and reactive methods. In proactive methods,
users are identified by the previous or current interaction of the user with the Website. Proactive
strategies incorporate methods such as user authentication, activation of cookies on the client- side,
dynamic pages associated with the browser, etc. (Spiliopoulou, Mobasher, Berendt, & Nakagawa,
2003). However, these proactive approaches are most accurate and reliable methods for identifying
users but they raise privacy concerns and purely dependent on users’ cooperation. In the absence
of user authentication approach, the most popular proactive approach to distinguishing unique
user is the use of client-side cookies information (Liu, 2007). Whenever a Web user navigates
through a Website for the first time, the Web server sends a cookie i.e. a piece of information
to the client browser. This information is stored on the client machine in the form of a text file
(Facca & Lanzi, 2005). A cookie may contain various information including users’ unique id. Few
researchers have applied the cookie based approach to identify users (Elo-Dean & Viveros, 1997;
Ivancsy & Juhasz, 2007; Kamdar & Joshi, 2000). Although this approach is considered as one of
the most accurate methods to identify users but cookies are not often recorded on client machine
due to browser constraints or users’ non-cooperation e.g. Some browsers do not support cookies
or disable cookies. Sometimes cookies are deleted by the user. On the other hand, in reactive
methods, users are identified from existing log records after interaction with the Website. One
of the basic approaches in reactive methods is identification by the IP address (Géry & Haddad,
2003). However, this approach is unable to deal with sharing IP address issue in the proxy server.
According to Cooley et al., two heuristics can be used to solve this issue: the first heuristic assumes
that two log entries having same IP addresses but different User agents may belong to two different
users. In the second heuristic, some additional information like Web site topology and referrer
log are used to identify users. This heuristic assumes that a user is considered as a new user if
requested page is not accessible through hyperlink of previously requested pages of the same IP
address (Cooley et al., 1999). Tanasa et al. have used IP address and User agent information to
identify users if authentication of the user is not available (Tanasa & Trousse, 2004). Castellano et
al. and Suneetha et al. also, have used IP address and user agent information to identifying users
(Castellano, Fanelli, & Torsello, 2007; Suneetha & Krishnamoorthi, 2009). Further, researchers
have applied the combined approach to identify users. According to their approach, if IP address is
same and User agent is different then consider a new user. Further, if both are same and requested
resource is not accessible through previously accessed pages then consider a new user (Reddy,
Reddy, & Sitaramulu, 2013).
However, all above-discussed methods are successfully applied in various applications but they
are not suitable for large datasets. In the last few years, MapReduce programming framework has
become a popular framework for distributed computation of big data that is executed on a cluster of
nodes and Hadoop is an open source implementation of MapReduce framework (Bhandarkar, 2010;
Dean & Ghemawat, 2008). Few researchers have focused on scalability issues of Data Preprocessing
methods in the Web usage mining process. They have identified Web users by using IP address
information in MapReduce framework (Savitha & Vijaya, 2014; Zhang & Zhang, 2013). However,
their methods are appropriate for large datasets but are unable to deal with proxy server problem.
Huang et al. have given an improved referrer based algorithm for user session identification using
MapReduce programming framework. For user identification, they have considered a specific user
is under same Asymmetric Digital Subscriber Line (ADSL) and same User agent (Huang, Chen, &
Le, 2013). This method is suitable for large datasets however it is not able to distinguish users from
Web robots at User identification phase.
11 more pages are available in the full version of this
document, which may be purchased using the "Add to Cart"
button on the product's webpage:
www.igi-global.com/article/a-mapreduce-based-user-
identification-algorithm-in-web-usage-
mining/198355?camid=4v1
This title is available in InfoSci-Digital Marketing, E-Business,
and E-Services eJournal Collection, InfoSci-Networking,
Mobile Applications, and Web Technologies eJournal
Collection, InfoSci-Journals, InfoSci-Journal Disciplines
Computer Science, Security, and Information Technology,
InfoSci-Journal Disciplines Engineering, Natural, and
Physical Science, InfoSci-Select. Recommend this product to
your librarian:
www.igi-global.com/e-resources/library-
recommendation/?id=162
Related Content
A Constraint Programming Approach for Web Log Mining
Amina Kemmar, Yahia Lebbah and Samir Loudni (2016). International Journal of
Information Technology and Web Engineering (pp. 24-42).
www.igi-global.com/article/a-constraint-programming-approach-for-web-log-
mining/165524?camid=4v1a
What is the Best Technique?
Emilia Mendes (2008). Cost Estimation Techniques for Web Projects (pp. 240-274).
www.igi-global.com/chapter/best-technique/7167?camid=4v1a
Enhancing Interface Understandability as a Means for Better Discovery of
Web Services
Usama Mahmoud Maabed, Ahmed El-Fatatry and Adel El-Zoghabi (2016).
International Journal of Information Technology and Web Engineering (pp. 1-23).
www.igi-global.com/article/enhancing-interface-understandability-as-a-
means-for-better-discovery-of-web-services/165523?camid=4v1a
Ontology-Supported Web Content Management
Geun-Sik Jo and Jason J. Jung (2005). Web Engineering: Principles and Techniques
(pp. 203-223).
www.igi-global.com/chapter/ontology-supported-web-content-
management/31114?camid=4v1a

Mais conteúdo relacionado

Semelhante a A MapReduce-Based User Identification Algorithm in Web Usage Mining.pdf

Classification of User & Pattern discovery in WUM: A Survey
Classification of User & Pattern discovery in WUM: A SurveyClassification of User & Pattern discovery in WUM: A Survey
Classification of User & Pattern discovery in WUM: A SurveyIRJET Journal
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
 
Web-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionWeb-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionIRJET Journal
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...ijdkp
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET Journal
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage dataijfcstjournal
 
A Survey of Issues and Techniques of Web Usage Mining
A Survey of Issues and Techniques of Web Usage MiningA Survey of Issues and Techniques of Web Usage Mining
A Survey of Issues and Techniques of Web Usage MiningIRJET Journal
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningIOSR Journals
 
Application of fuzzy logic for user
Application of fuzzy logic for userApplication of fuzzy logic for user
Application of fuzzy logic for userIJCI JOURNAL
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...ijcsa
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...ijcsa
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
IRJET-Model for semantic processing in information retrieval systems
IRJET-Model for semantic processing in information retrieval systemsIRJET-Model for semantic processing in information retrieval systems
IRJET-Model for semantic processing in information retrieval systemsIRJET Journal
 
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICES
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICESIMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICES
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICESIAEME Publication
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...IOSR Journals
 
An Extensible Web Mining Framework for Real Knowledge
An Extensible Web Mining Framework for Real KnowledgeAn Extensible Web Mining Framework for Real Knowledge
An Extensible Web Mining Framework for Real KnowledgeIJEACS
 
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...James Heller
 
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...ijdkp
 

Semelhante a A MapReduce-Based User Identification Algorithm in Web Usage Mining.pdf (20)

Classification of User & Pattern discovery in WUM: A Survey
Classification of User & Pattern discovery in WUM: A SurveyClassification of User & Pattern discovery in WUM: A Survey
Classification of User & Pattern discovery in WUM: A Survey
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
 
Web-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionWeb-Application Framework for E-Business Solution
Web-Application Framework for E-Business Solution
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage Mining
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
A Survey of Issues and Techniques of Web Usage Mining
A Survey of Issues and Techniques of Web Usage MiningA Survey of Issues and Techniques of Web Usage Mining
A Survey of Issues and Techniques of Web Usage Mining
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage mining
 
Application of fuzzy logic for user
Application of fuzzy logic for userApplication of fuzzy logic for user
Application of fuzzy logic for user
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
IRJET-Model for semantic processing in information retrieval systems
IRJET-Model for semantic processing in information retrieval systemsIRJET-Model for semantic processing in information retrieval systems
IRJET-Model for semantic processing in information retrieval systems
 
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICES
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICESIMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICES
IMPLEMENTATION OF SASF CRAWLER BASED ON MINING SERVICES
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
An Extensible Web Mining Framework for Real Knowledge
An Extensible Web Mining Framework for Real KnowledgeAn Extensible Web Mining Framework for Real Knowledge
An Extensible Web Mining Framework for Real Knowledge
 
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
 
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
 

Mais de Tracy Morgan

45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos
45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos
45 Plantillas Perfectas De Declaracin De Tesis ( EjemplosTracy Morgan
 
Thesis Statement Tips. What Is A Thesis Statement. 202
Thesis Statement Tips. What Is A Thesis Statement. 202Thesis Statement Tips. What Is A Thesis Statement. 202
Thesis Statement Tips. What Is A Thesis Statement. 202Tracy Morgan
 
B Buy Essays Online Usa, Find Someone To Write My E
B Buy Essays Online Usa, Find Someone To Write My EB Buy Essays Online Usa, Find Someone To Write My E
B Buy Essays Online Usa, Find Someone To Write My ETracy Morgan
 
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING S
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING SResearch Proposal Writing Service - RESEARCH PROPOSAL WRITING S
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING STracy Morgan
 
Emailing Modern Language Association MLA Handboo
Emailing Modern Language Association MLA HandbooEmailing Modern Language Association MLA Handboo
Emailing Modern Language Association MLA HandbooTracy Morgan
 
Custom Wrapping Paper Custom Printed Wrapping
Custom Wrapping Paper Custom Printed WrappingCustom Wrapping Paper Custom Printed Wrapping
Custom Wrapping Paper Custom Printed WrappingTracy Morgan
 
How To Write Expository Essay Sketsa. Online assignment writing service.
How To Write Expository Essay Sketsa. Online assignment writing service.How To Write Expository Essay Sketsa. Online assignment writing service.
How To Write Expository Essay Sketsa. Online assignment writing service.Tracy Morgan
 
1 Custom Essays Writing. Homework Help Sites.
1 Custom Essays Writing. Homework Help Sites.1 Custom Essays Writing. Homework Help Sites.
1 Custom Essays Writing. Homework Help Sites.Tracy Morgan
 
How To Write An Advertisement A Guide Fo
How To Write An Advertisement A Guide FoHow To Write An Advertisement A Guide Fo
How To Write An Advertisement A Guide FoTracy Morgan
 
How To Write A Summary Essay Of An Article. How T
How To Write A Summary Essay Of An Article. How THow To Write A Summary Essay Of An Article. How T
How To Write A Summary Essay Of An Article. How TTracy Morgan
 
Write A Paper For Me - College Homework Help A
Write A Paper For Me - College Homework Help AWrite A Paper For Me - College Homework Help A
Write A Paper For Me - College Homework Help ATracy Morgan
 
24 Hilariously Accurate College Memes. Online assignment writing service.
24 Hilariously Accurate College Memes. Online assignment writing service.24 Hilariously Accurate College Memes. Online assignment writing service.
24 Hilariously Accurate College Memes. Online assignment writing service.Tracy Morgan
 
Oh, The Places YouLl Go Printable - Simply Kinder
Oh, The Places YouLl Go Printable - Simply KinderOh, The Places YouLl Go Printable - Simply Kinder
Oh, The Places YouLl Go Printable - Simply KinderTracy Morgan
 
How Ghostwriting Will Kickstart Your Music Career
How Ghostwriting Will Kickstart Your Music CareerHow Ghostwriting Will Kickstart Your Music Career
How Ghostwriting Will Kickstart Your Music CareerTracy Morgan
 
MLA Handbook, 9Th Edition PDF - SoftArchive
MLA Handbook, 9Th Edition PDF - SoftArchiveMLA Handbook, 9Th Edition PDF - SoftArchive
MLA Handbook, 9Th Edition PDF - SoftArchiveTracy Morgan
 
How To Improve Your Writing Skills With 10 Simple Tips
How To Improve Your Writing Skills With 10 Simple TipsHow To Improve Your Writing Skills With 10 Simple Tips
How To Improve Your Writing Skills With 10 Simple TipsTracy Morgan
 
Discursive Essay. Online assignment writing service.
Discursive Essay. Online assignment writing service.Discursive Essay. Online assignment writing service.
Discursive Essay. Online assignment writing service.Tracy Morgan
 
Creative Writing Prompts 01 - TimS Printables
Creative Writing Prompts 01 - TimS PrintablesCreative Writing Prompts 01 - TimS Printables
Creative Writing Prompts 01 - TimS PrintablesTracy Morgan
 
Little Mermaid Writing Paper, Ariel Writing Paper Writin
Little Mermaid Writing Paper, Ariel Writing Paper WritinLittle Mermaid Writing Paper, Ariel Writing Paper Writin
Little Mermaid Writing Paper, Ariel Writing Paper WritinTracy Morgan
 
How To Use APA Format Apa Format, Apa Format Ex
How To Use APA Format Apa Format, Apa Format ExHow To Use APA Format Apa Format, Apa Format Ex
How To Use APA Format Apa Format, Apa Format ExTracy Morgan
 

Mais de Tracy Morgan (20)

45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos
45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos
45 Plantillas Perfectas De Declaracin De Tesis ( Ejemplos
 
Thesis Statement Tips. What Is A Thesis Statement. 202
Thesis Statement Tips. What Is A Thesis Statement. 202Thesis Statement Tips. What Is A Thesis Statement. 202
Thesis Statement Tips. What Is A Thesis Statement. 202
 
B Buy Essays Online Usa, Find Someone To Write My E
B Buy Essays Online Usa, Find Someone To Write My EB Buy Essays Online Usa, Find Someone To Write My E
B Buy Essays Online Usa, Find Someone To Write My E
 
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING S
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING SResearch Proposal Writing Service - RESEARCH PROPOSAL WRITING S
Research Proposal Writing Service - RESEARCH PROPOSAL WRITING S
 
Emailing Modern Language Association MLA Handboo
Emailing Modern Language Association MLA HandbooEmailing Modern Language Association MLA Handboo
Emailing Modern Language Association MLA Handboo
 
Custom Wrapping Paper Custom Printed Wrapping
Custom Wrapping Paper Custom Printed WrappingCustom Wrapping Paper Custom Printed Wrapping
Custom Wrapping Paper Custom Printed Wrapping
 
How To Write Expository Essay Sketsa. Online assignment writing service.
How To Write Expository Essay Sketsa. Online assignment writing service.How To Write Expository Essay Sketsa. Online assignment writing service.
How To Write Expository Essay Sketsa. Online assignment writing service.
 
1 Custom Essays Writing. Homework Help Sites.
1 Custom Essays Writing. Homework Help Sites.1 Custom Essays Writing. Homework Help Sites.
1 Custom Essays Writing. Homework Help Sites.
 
How To Write An Advertisement A Guide Fo
How To Write An Advertisement A Guide FoHow To Write An Advertisement A Guide Fo
How To Write An Advertisement A Guide Fo
 
How To Write A Summary Essay Of An Article. How T
How To Write A Summary Essay Of An Article. How THow To Write A Summary Essay Of An Article. How T
How To Write A Summary Essay Of An Article. How T
 
Write A Paper For Me - College Homework Help A
Write A Paper For Me - College Homework Help AWrite A Paper For Me - College Homework Help A
Write A Paper For Me - College Homework Help A
 
24 Hilariously Accurate College Memes. Online assignment writing service.
24 Hilariously Accurate College Memes. Online assignment writing service.24 Hilariously Accurate College Memes. Online assignment writing service.
24 Hilariously Accurate College Memes. Online assignment writing service.
 
Oh, The Places YouLl Go Printable - Simply Kinder
Oh, The Places YouLl Go Printable - Simply KinderOh, The Places YouLl Go Printable - Simply Kinder
Oh, The Places YouLl Go Printable - Simply Kinder
 
How Ghostwriting Will Kickstart Your Music Career
How Ghostwriting Will Kickstart Your Music CareerHow Ghostwriting Will Kickstart Your Music Career
How Ghostwriting Will Kickstart Your Music Career
 
MLA Handbook, 9Th Edition PDF - SoftArchive
MLA Handbook, 9Th Edition PDF - SoftArchiveMLA Handbook, 9Th Edition PDF - SoftArchive
MLA Handbook, 9Th Edition PDF - SoftArchive
 
How To Improve Your Writing Skills With 10 Simple Tips
How To Improve Your Writing Skills With 10 Simple TipsHow To Improve Your Writing Skills With 10 Simple Tips
How To Improve Your Writing Skills With 10 Simple Tips
 
Discursive Essay. Online assignment writing service.
Discursive Essay. Online assignment writing service.Discursive Essay. Online assignment writing service.
Discursive Essay. Online assignment writing service.
 
Creative Writing Prompts 01 - TimS Printables
Creative Writing Prompts 01 - TimS PrintablesCreative Writing Prompts 01 - TimS Printables
Creative Writing Prompts 01 - TimS Printables
 
Little Mermaid Writing Paper, Ariel Writing Paper Writin
Little Mermaid Writing Paper, Ariel Writing Paper WritinLittle Mermaid Writing Paper, Ariel Writing Paper Writin
Little Mermaid Writing Paper, Ariel Writing Paper Writin
 
How To Use APA Format Apa Format, Apa Format Ex
How To Use APA Format Apa Format, Apa Format ExHow To Use APA Format Apa Format, Apa Format Ex
How To Use APA Format Apa Format, Apa Format Ex
 

Último

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Último (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

A MapReduce-Based User Identification Algorithm in Web Usage Mining.pdf

  • 1. DOI: 10.4018/IJITWE.2018040102 International Journal of Information Technology and Web Engineering Volume 13 • Issue 2 • April-June 2018 Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 11 A MapReduce-Based User Identification Algorithm in Web Usage Mining Mitali Srivastava, Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India Rakhi Garg, Computer Science Section, Mahila Maha Vidyalaya, Banaras Hindu University, Varanasi, India P.K. Mishra, Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India ABSTRACT This article contends that in the booming era of information, analysing users’ navigation behaviour is an important task. User identification is considered as one of the important and challenging tasks in the data preprocessing phase of the Web usage mining process. There are three important issues with the reactive strategies of User identification methods that need to be focused: the first is dealing of sharing IP address problem in a proxy server environment, the second is distinguishing users from Web robots, and the third is dealing with huge datasets efficiently. In this article, authors have developed a MapReduce-based User identification algorithm that deals with the above mentioned three issues related to user identification methods. Moreover, the experiment on the real web server log shows the effectiveness and efficiency of the developed algorithm. KEyWoRdS Data Cleaning, Data Preprocessing, Hadoop, MapReduce, User Identification,Web Server Log,Web Usage Mining 1. INTRodUCTIoN Apart from the content and structural information of the Website, server logs have also been considered as one of the valuable sources of information. This information can be used to analyse users’ navigation behaviour (Pabarskaite & Raudys, 2007). Web usage mining is a class of Web mining to mine server logs to find relevant patterns. These patterns are successfully applied in various applications like restructuring Websites, recommendation of pages and products, personalizing Web contents, and improving server activities like prefetching and caching (Facca & Lanzi, 2005; Kemmar, Lebbah, & Loudni, 2016). Web usage mining process can be divided into three important steps: Data preprocessing, Pattern extraction and Pattern evaluation (Liu, 2007). Due to the unstructured and huge nature of log data, Data preprocessing step has become the essential and time-consuming task in the Web usage mining process. It is a complex task and consumes more than 60% of whole Web usage mining process time (Tanasa & Trousse, 2004). Data preprocessing of server log incorporates several steps: Data fusion, Data cleaning, User identification, Session identification, Path completion, and Data transformation (Cooley, Mobasher, & Srivastava, 1999; Liu, 2007). Among them, User identification is one of the challenging tasks in Data Preprocessing
  • 2. International Journal of Information Technology and Web Engineering Volume 13 • Issue 2 • April-June 2018 12 due to the external/local proxy server, shared internet and cache systems (Pabarskaite & Raudys, 2007). This article focuses on User identification, a complex and challenging phase in the Web usage mining process. In User identification phase, users are identified and their activities are grouped and recorded into a user activity file. Several heuristics have been proposed for better identification of the user in last few years. Spiliopoulou et al. have classified user identification methods into two classes namely proactive methods and reactive methods. In proactive methods, users are identified by the previous or current interaction of the user with the Website. Proactive strategies incorporate methods such as user authentication, activation of cookies on the client- side, dynamic pages associated with the browser, etc. (Spiliopoulou, Mobasher, Berendt, & Nakagawa, 2003). However, these proactive approaches are most accurate and reliable methods for identifying users but they raise privacy concerns and purely dependent on users’ cooperation. In the absence of user authentication approach, the most popular proactive approach to distinguishing unique user is the use of client-side cookies information (Liu, 2007). Whenever a Web user navigates through a Website for the first time, the Web server sends a cookie i.e. a piece of information to the client browser. This information is stored on the client machine in the form of a text file (Facca & Lanzi, 2005). A cookie may contain various information including users’ unique id. Few researchers have applied the cookie based approach to identify users (Elo-Dean & Viveros, 1997; Ivancsy & Juhasz, 2007; Kamdar & Joshi, 2000). Although this approach is considered as one of the most accurate methods to identify users but cookies are not often recorded on client machine due to browser constraints or users’ non-cooperation e.g. Some browsers do not support cookies or disable cookies. Sometimes cookies are deleted by the user. On the other hand, in reactive methods, users are identified from existing log records after interaction with the Website. One of the basic approaches in reactive methods is identification by the IP address (Géry & Haddad, 2003). However, this approach is unable to deal with sharing IP address issue in the proxy server. According to Cooley et al., two heuristics can be used to solve this issue: the first heuristic assumes that two log entries having same IP addresses but different User agents may belong to two different users. In the second heuristic, some additional information like Web site topology and referrer log are used to identify users. This heuristic assumes that a user is considered as a new user if requested page is not accessible through hyperlink of previously requested pages of the same IP address (Cooley et al., 1999). Tanasa et al. have used IP address and User agent information to identify users if authentication of the user is not available (Tanasa & Trousse, 2004). Castellano et al. and Suneetha et al. also, have used IP address and user agent information to identifying users (Castellano, Fanelli, & Torsello, 2007; Suneetha & Krishnamoorthi, 2009). Further, researchers have applied the combined approach to identify users. According to their approach, if IP address is same and User agent is different then consider a new user. Further, if both are same and requested resource is not accessible through previously accessed pages then consider a new user (Reddy, Reddy, & Sitaramulu, 2013). However, all above-discussed methods are successfully applied in various applications but they are not suitable for large datasets. In the last few years, MapReduce programming framework has become a popular framework for distributed computation of big data that is executed on a cluster of nodes and Hadoop is an open source implementation of MapReduce framework (Bhandarkar, 2010; Dean & Ghemawat, 2008). Few researchers have focused on scalability issues of Data Preprocessing methods in the Web usage mining process. They have identified Web users by using IP address information in MapReduce framework (Savitha & Vijaya, 2014; Zhang & Zhang, 2013). However, their methods are appropriate for large datasets but are unable to deal with proxy server problem. Huang et al. have given an improved referrer based algorithm for user session identification using MapReduce programming framework. For user identification, they have considered a specific user is under same Asymmetric Digital Subscriber Line (ADSL) and same User agent (Huang, Chen, & Le, 2013). This method is suitable for large datasets however it is not able to distinguish users from Web robots at User identification phase.
  • 3. 11 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/a-mapreduce-based-user- identification-algorithm-in-web-usage- mining/198355?camid=4v1 This title is available in InfoSci-Digital Marketing, E-Business, and E-Services eJournal Collection, InfoSci-Networking, Mobile Applications, and Web Technologies eJournal Collection, InfoSci-Journals, InfoSci-Journal Disciplines Computer Science, Security, and Information Technology, InfoSci-Journal Disciplines Engineering, Natural, and Physical Science, InfoSci-Select. Recommend this product to your librarian: www.igi-global.com/e-resources/library- recommendation/?id=162 Related Content A Constraint Programming Approach for Web Log Mining Amina Kemmar, Yahia Lebbah and Samir Loudni (2016). International Journal of Information Technology and Web Engineering (pp. 24-42). www.igi-global.com/article/a-constraint-programming-approach-for-web-log- mining/165524?camid=4v1a What is the Best Technique? Emilia Mendes (2008). Cost Estimation Techniques for Web Projects (pp. 240-274). www.igi-global.com/chapter/best-technique/7167?camid=4v1a
  • 4. Enhancing Interface Understandability as a Means for Better Discovery of Web Services Usama Mahmoud Maabed, Ahmed El-Fatatry and Adel El-Zoghabi (2016). International Journal of Information Technology and Web Engineering (pp. 1-23). www.igi-global.com/article/enhancing-interface-understandability-as-a- means-for-better-discovery-of-web-services/165523?camid=4v1a Ontology-Supported Web Content Management Geun-Sik Jo and Jason J. Jung (2005). Web Engineering: Principles and Techniques (pp. 203-223). www.igi-global.com/chapter/ontology-supported-web-content- management/31114?camid=4v1a