SlideShare uma empresa Scribd logo
1 de 14
A real-time Web Analytics System

                           Mahesh Patwardhan
             Digital and New Media Consultant
Contents
1.   Introduction
2.   The Requirements
3.   The Architecture
4.   The Reports
5.   The Implementation
6.   Conclusion
Introduction
   This document describes an implementation of a realtime
    web logs capture and reporting system.

   This system was developed to provide realtime reports for
    measuring traffic parameters like pageviews, visits, unique
    visitors etc. in realtime.

   The system was designed and built to replace the batch
    process system which generated reports in a deferred mode

   Was built to allow for realtime monitoring and action on the
    various online services.
Requirements
        ◦ Shortcomings of existing system
           The existing system generated reports on the previous day’s logs and not real time,
           the system could not be scaled up,
           was not equipped to handle heavy traffic,
           had no scope for adding new services
           there was no scope for adding or editing logs.

        ◦ Requirements of the new system was to provide for
           Real time web log capture from web servers at geographically dispersed locations
           Building a robust web logs data warehouse
           Provide extensive realtime reports from the web logs

        ◦ The advantages of this system would be:
           Can access data in “real time”
           The process can be scaled up to handle more traffic
           Provision has been made to add a new service or delete an existing service, which can be accessed
            from the very next day
           Logs can be added and modified

   .
…Requirements

◦ The system was required to capture, collate, and aggregate the web-logs
  which accumulate on the web-app servers.

◦ The aggregates need to be produced in near-real time.

◦ A multi-layer architecture needed to be deployed
   a layer of capture agents deployed on every web-app server
   a layer of collation server applications which collate data from the capture agents
   a layer of computation servers which aggregate data at high speed, needs to be
    implemented.

◦ This multi-layer architecture would aggregate data in industry-standard
  RDBMS tables, which could then be queried for viewing using user
  interface screens.

◦ The aggregate tables were to be updated in near-real-time
Architecture
…Architecture
    ◦ The architecture has four layers
         Collation clients (L1),
         Collation servers (L2),
         Computation servers (L3),
         Reporting server (L4)
         A database server to store the aggregated results.

    ◦ By design the architecture is completely scalable in the
      first three layers L1, L2, L3.

    ◦ All the layers communicate with each other over TCP/IP.


…Architecture
   Each collation client in L1 will connect to one Collation server in L2.
    ◦ A maximum of 30 Collation clients can connect to one Collation server.
    ◦ Primary back-up fail-over features will be provided (If one of the
      collation server fails, clients connecting to that will automatically
      shift to other servers in the cluster).

   The computation is distributed to the computation servers (L3) by
    service.
    ◦ Computation required for a service will be handled by its Computation
      server.
    ◦ Primary back fail-over is not possible in this layer.
    ◦ If required the architecture will allow distribution of computing by service.
      (for example there can be two servers performing computations for a
      service like e-mail).

   The computed information (aggregated) is stored in a database, which
    is used by the L4 (Reporting) layer.
Reports

◦   Hits by time
◦   Page Views by time, by pages
◦   Visits by time, by page
◦   Unique visitor by time, by page
◦   Return frequency
◦   Return visit
◦   Visiting frequency by visitor
◦   Average time spent
◦   By page average time spent
◦   Referrer by domains, URL
…Reports

◦   Search engines
◦   Search engine keywords
◦    By search engine by keyword
◦   Browser type, version, OS
◦   Parameter analysis
◦   Country, city, state wise reports
◦   By country top pages
◦   By ISP
◦   Top entry pages
◦   Top exit pages
◦   Path reporting (across service)
◦   Directory filter based reporting
◦   Fall-out reports
Implementation
   The implementation of the solution was done on
    an incremental basis. Deliverables were planned
    for each increment based on the requirement
    specified. There were five development cycles, the
    details of which are as specified

   Incremental cycle 1
    ◦   Setting up the framework for real-time log capture
    ◦   Health monitoring system
    ◦   Hits by time
    ◦   Page Views by time, by pages
…Implementation
   Incremental cycle 2
    ◦   Visits by time, by page
    ◦   Unique visitor by time, by page
    ◦   Return frequency
    ◦   Return visit
    ◦   Visiting frequency by visitor
    ◦   Average time spent
    ◦   By page average time spent
   Incremental cycle 3
    ◦   Referrer by domains, URL
    ◦   Search engines
    ◦   Search engine keywords
    ◦   By search engine by keyword
    ◦   Browser type, version, OS
    ◦   Parameter analysis
…Implementation

◦ Incremental cycle 4
     Country, city, state wise reports
     By country top pages
     By ISP
     Top entry pages
     Top exit pages
     Path reporting (across service)

◦ Incremental cycle 5
   Directory filter based reporting
   Fall-out reports

◦ The deliverables in each phase required elements of each layer to be
  developed, implemented, tested and deployed. For instance, a few
  database tables of the final aggregate table schema were needed to be
  designed from the first cycle itself along with the corresponding
  reports.
Conclusion

◦ This document describes an implementation of a realtime web logs capture and
  reporting system.

◦ This system was developed to provide realtime reports for measuring traffic
  parameters like pageviews, visits, unique visitors etc. in realtime.

◦ The system was designed and built to replace the batch process system which
  generated reports in a deferred mode and did not allow for realtime
  monitoring and action on the various online services.

◦ The architecture of the system consists of four layers - the Collation client
  agent, the Collation layer ,the Computation layer and the Reporting layer

◦ This system has overcome the shortcomings of the existing system which was
  not scalable and provided reports in a deferred mode.

◦ This was overcome by the present system which has a highly scalable
  architecture and provides reports in real time.

Mais conteúdo relacionado

Semelhante a A Real Time Web Analytics System

A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...Fatima Qayyum
 
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...ATAGTR2017 Unified APM: The new age performance monitoring for production sys...
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...Agile Testing Alliance
 
End user-experience monitoring
End user-experience monitoring End user-experience monitoring
End user-experience monitoring Site24x7
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHewlett-Packard
 
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 publicOjoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 publicOjoconsulting Oy
 
Performance Monitoring at Spreadshirt
Performance Monitoring at SpreadshirtPerformance Monitoring at Spreadshirt
Performance Monitoring at SpreadshirtMartin Breest
 
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...eG Innovations
 
webservertrafficanalysis
webservertrafficanalysiswebservertrafficanalysis
webservertrafficanalysisnitesh kanojiya
 
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...JSFestUA
 
Railway Reservation System - Software Engineering
Railway Reservation System - Software EngineeringRailway Reservation System - Software Engineering
Railway Reservation System - Software EngineeringLalit Pal
 
Auto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivesAuto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivessflynn073
 
Deployability
DeployabilityDeployability
DeployabilityLen Bass
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 

Semelhante a A Real Time Web Analytics System (20)

Closing the door on application performance problems
Closing the door on application performance problemsClosing the door on application performance problems
Closing the door on application performance problems
 
Server monitoring made easy with Applications Manager
Server monitoring made easy with Applications ManagerServer monitoring made easy with Applications Manager
Server monitoring made easy with Applications Manager
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...ATAGTR2017 Unified APM: The new age performance monitoring for production sys...
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...
 
End user-experience monitoring
End user-experience monitoring End user-experience monitoring
End user-experience monitoring
 
How to Monitor IIS
How to Monitor IISHow to Monitor IIS
How to Monitor IIS
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShare
 
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 publicOjoconsulting Oy Nimbus Monitoring Service description v1.2 public
Ojoconsulting Oy Nimbus Monitoring Service description v1.2 public
 
Performance Monitoring at Spreadshirt
Performance Monitoring at SpreadshirtPerformance Monitoring at Spreadshirt
Performance Monitoring at Spreadshirt
 
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...
Choosing the Best Approach for Monitoring Citrix User Experience: Should You ...
 
webservertrafficanalysis
webservertrafficanalysiswebservertrafficanalysis
webservertrafficanalysis
 
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
 
Railway Reservation System - Software Engineering
Railway Reservation System - Software EngineeringRailway Reservation System - Software Engineering
Railway Reservation System - Software Engineering
 
Browser Based Performance Testing and Tuning
Browser Based Performance Testing and TuningBrowser Based Performance Testing and Tuning
Browser Based Performance Testing and Tuning
 
Auto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivesAuto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectives
 
Deployability
DeployabilityDeployability
Deployability
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
implementing the right website monitoring strategy
 implementing the right website monitoring strategy implementing the right website monitoring strategy
implementing the right website monitoring strategy
 
Applications manager 1 - Middle East Workshop
Applications manager 1 - Middle East WorkshopApplications manager 1 - Middle East Workshop
Applications manager 1 - Middle East Workshop
 

Mais de Mahesh Patwardhan

IT Control Objectives for SOX
IT Control Objectives for SOXIT Control Objectives for SOX
IT Control Objectives for SOXMahesh Patwardhan
 
Social Media Publishing & Aggregation
Social Media Publishing & AggregationSocial Media Publishing & Aggregation
Social Media Publishing & AggregationMahesh Patwardhan
 
Social Media For A Sporting Event
Social Media For A Sporting EventSocial Media For A Sporting Event
Social Media For A Sporting EventMahesh Patwardhan
 
Revenue Reconciliation System
Revenue Reconciliation SystemRevenue Reconciliation System
Revenue Reconciliation SystemMahesh Patwardhan
 
Concept for a Facebook App for a Mexican Restaurant
Concept for a Facebook App for a Mexican RestaurantConcept for a Facebook App for a Mexican Restaurant
Concept for a Facebook App for a Mexican RestaurantMahesh Patwardhan
 
A concept for a facebook app
A concept for a facebook appA concept for a facebook app
A concept for a facebook appMahesh Patwardhan
 
Digital And New Media Strategy using Web 2.0
Digital And New Media Strategy using Web 2.0Digital And New Media Strategy using Web 2.0
Digital And New Media Strategy using Web 2.0Mahesh Patwardhan
 
Digital And New Media Consultancy Services
Digital And New Media Consultancy ServicesDigital And New Media Consultancy Services
Digital And New Media Consultancy ServicesMahesh Patwardhan
 
Social Media in Sports - some Case Studies
Social Media in Sports - some Case StudiesSocial Media in Sports - some Case Studies
Social Media in Sports - some Case StudiesMahesh Patwardhan
 
Social Media - some case studies
Social Media - some case studiesSocial Media - some case studies
Social Media - some case studiesMahesh Patwardhan
 

Mais de Mahesh Patwardhan (16)

IT Control Objectives for SOX
IT Control Objectives for SOXIT Control Objectives for SOX
IT Control Objectives for SOX
 
Model Information Office
Model Information OfficeModel Information Office
Model Information Office
 
Digital Landscape
Digital LandscapeDigital Landscape
Digital Landscape
 
Social Media Publishing & Aggregation
Social Media Publishing & AggregationSocial Media Publishing & Aggregation
Social Media Publishing & Aggregation
 
Social Media Metrics
Social Media MetricsSocial Media Metrics
Social Media Metrics
 
Social Media For A Sporting Event
Social Media For A Sporting EventSocial Media For A Sporting Event
Social Media For A Sporting Event
 
Revenue Reconciliation System
Revenue Reconciliation SystemRevenue Reconciliation System
Revenue Reconciliation System
 
Business Analytics System
Business Analytics SystemBusiness Analytics System
Business Analytics System
 
The Information Office
The Information OfficeThe Information Office
The Information Office
 
Concept for a Facebook App for a Mexican Restaurant
Concept for a Facebook App for a Mexican RestaurantConcept for a Facebook App for a Mexican Restaurant
Concept for a Facebook App for a Mexican Restaurant
 
A concept for a facebook app
A concept for a facebook appA concept for a facebook app
A concept for a facebook app
 
Digital And New Media Strategy using Web 2.0
Digital And New Media Strategy using Web 2.0Digital And New Media Strategy using Web 2.0
Digital And New Media Strategy using Web 2.0
 
Digital And New Media Consultancy Services
Digital And New Media Consultancy ServicesDigital And New Media Consultancy Services
Digital And New Media Consultancy Services
 
Lets Build A Story
Lets Build A StoryLets Build A Story
Lets Build A Story
 
Social Media in Sports - some Case Studies
Social Media in Sports - some Case StudiesSocial Media in Sports - some Case Studies
Social Media in Sports - some Case Studies
 
Social Media - some case studies
Social Media - some case studiesSocial Media - some case studies
Social Media - some case studies
 

Último

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Último (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

A Real Time Web Analytics System

  • 1. A real-time Web Analytics System Mahesh Patwardhan Digital and New Media Consultant
  • 2. Contents 1. Introduction 2. The Requirements 3. The Architecture 4. The Reports 5. The Implementation 6. Conclusion
  • 3. Introduction  This document describes an implementation of a realtime web logs capture and reporting system.  This system was developed to provide realtime reports for measuring traffic parameters like pageviews, visits, unique visitors etc. in realtime.  The system was designed and built to replace the batch process system which generated reports in a deferred mode  Was built to allow for realtime monitoring and action on the various online services.
  • 4. Requirements ◦ Shortcomings of existing system  The existing system generated reports on the previous day’s logs and not real time,  the system could not be scaled up,  was not equipped to handle heavy traffic,  had no scope for adding new services  there was no scope for adding or editing logs. ◦ Requirements of the new system was to provide for  Real time web log capture from web servers at geographically dispersed locations  Building a robust web logs data warehouse  Provide extensive realtime reports from the web logs ◦ The advantages of this system would be:  Can access data in “real time”  The process can be scaled up to handle more traffic  Provision has been made to add a new service or delete an existing service, which can be accessed from the very next day  Logs can be added and modified  .
  • 5. …Requirements ◦ The system was required to capture, collate, and aggregate the web-logs which accumulate on the web-app servers. ◦ The aggregates need to be produced in near-real time. ◦ A multi-layer architecture needed to be deployed  a layer of capture agents deployed on every web-app server  a layer of collation server applications which collate data from the capture agents  a layer of computation servers which aggregate data at high speed, needs to be implemented. ◦ This multi-layer architecture would aggregate data in industry-standard RDBMS tables, which could then be queried for viewing using user interface screens. ◦ The aggregate tables were to be updated in near-real-time
  • 7. …Architecture ◦ The architecture has four layers  Collation clients (L1),  Collation servers (L2),  Computation servers (L3),  Reporting server (L4)  A database server to store the aggregated results. ◦ By design the architecture is completely scalable in the first three layers L1, L2, L3. ◦ All the layers communicate with each other over TCP/IP. 
  • 8. …Architecture  Each collation client in L1 will connect to one Collation server in L2. ◦ A maximum of 30 Collation clients can connect to one Collation server. ◦ Primary back-up fail-over features will be provided (If one of the collation server fails, clients connecting to that will automatically shift to other servers in the cluster).  The computation is distributed to the computation servers (L3) by service. ◦ Computation required for a service will be handled by its Computation server. ◦ Primary back fail-over is not possible in this layer. ◦ If required the architecture will allow distribution of computing by service. (for example there can be two servers performing computations for a service like e-mail).  The computed information (aggregated) is stored in a database, which is used by the L4 (Reporting) layer.
  • 9. Reports ◦ Hits by time ◦ Page Views by time, by pages ◦ Visits by time, by page ◦ Unique visitor by time, by page ◦ Return frequency ◦ Return visit ◦ Visiting frequency by visitor ◦ Average time spent ◦ By page average time spent ◦ Referrer by domains, URL
  • 10. …Reports ◦ Search engines ◦ Search engine keywords ◦ By search engine by keyword ◦ Browser type, version, OS ◦ Parameter analysis ◦ Country, city, state wise reports ◦ By country top pages ◦ By ISP ◦ Top entry pages ◦ Top exit pages ◦ Path reporting (across service) ◦ Directory filter based reporting ◦ Fall-out reports
  • 11. Implementation  The implementation of the solution was done on an incremental basis. Deliverables were planned for each increment based on the requirement specified. There were five development cycles, the details of which are as specified  Incremental cycle 1 ◦ Setting up the framework for real-time log capture ◦ Health monitoring system ◦ Hits by time ◦ Page Views by time, by pages
  • 12. …Implementation  Incremental cycle 2 ◦ Visits by time, by page ◦ Unique visitor by time, by page ◦ Return frequency ◦ Return visit ◦ Visiting frequency by visitor ◦ Average time spent ◦ By page average time spent  Incremental cycle 3 ◦ Referrer by domains, URL ◦ Search engines ◦ Search engine keywords ◦ By search engine by keyword ◦ Browser type, version, OS ◦ Parameter analysis
  • 13. …Implementation ◦ Incremental cycle 4  Country, city, state wise reports  By country top pages  By ISP  Top entry pages  Top exit pages  Path reporting (across service) ◦ Incremental cycle 5  Directory filter based reporting  Fall-out reports ◦ The deliverables in each phase required elements of each layer to be developed, implemented, tested and deployed. For instance, a few database tables of the final aggregate table schema were needed to be designed from the first cycle itself along with the corresponding reports.
  • 14. Conclusion ◦ This document describes an implementation of a realtime web logs capture and reporting system. ◦ This system was developed to provide realtime reports for measuring traffic parameters like pageviews, visits, unique visitors etc. in realtime. ◦ The system was designed and built to replace the batch process system which generated reports in a deferred mode and did not allow for realtime monitoring and action on the various online services. ◦ The architecture of the system consists of four layers - the Collation client agent, the Collation layer ,the Computation layer and the Reporting layer ◦ This system has overcome the shortcomings of the existing system which was not scalable and provided reports in a deferred mode. ◦ This was overcome by the present system which has a highly scalable architecture and provides reports in real time.