This presentation accompanied a great talk on Web Analytics by Anne Marie Macek, Senior Manager in Data Strategy at Marriott International, at the DC Business Intelligentsia Meetup on December 11.
For more info on future events visit: http://www.meetup.com/BusinessIntelligentsiaDC/events/150884302/
2. AGENDA
• Introduction to Web Analytics
• Data Sources, Data Capture
• Vocabulary
• Data Modeling Basics
• Relational vs. Dimensional
• Normalization, De-normalization, Aggregation
• Web Analytics + Data Modeling
• Four-tiered Data Model for Web data
• Challenges
• Q&A
3. INTRODUCTION
• Anne Marie Macek
• Senior Manager, Data Strategy
• Consumer Insight and Revenue Strategy
• Marriott International
• 30+ years Data Modeling and Reporting
• 14+ years Data Warehousing and Business
Intelligence
• 4+ years Web Analytics Data and Reporting
• MBA, Management Information Systems
• BS, Mathematics and Computer Science
4. EXPERIENCE
• Data Modeling:
• Flat Files, IMS/DB, DB2, Oracle, Netezza
• MS Access, Borland Paradox
• Cognos Powerplay, MS Analysis Services, Cognos 10.2
Dynamic Cubes
• Reporting:
• COBOL, Focus, SAS, Actuate
• Cognos BI Suite
• Business Functions:
• eCommerce, Revenue Management, Sales & Marketing
• Human Resources, Finance
5. DEFINITION
• Web analytics is the
measurement, collection, analysis and reporting
of internet data for purposes of understanding and
optimizing web usage.
Source: Wikipedia
6. OBJECTIVES
• Website Performance
• Conversion Rate ($ sales / # visits)
• Trends over time
• In Response to Campaigns
• Website Optimization
• Customer Behavior
• Technological Trends
• Integration
• Customer Lifetime Value / Segmentation
• Personalization
• Proactive display of pertinent information
7. DATA SOURCES
•
•
•
•
•
•
•
•
•
•
•
Click-stream Data
Search Engine Optimization (SEO)
Campaign Classification
Email Campaigns
Advertising Impressions
3rd Party Marketing Data
IP Geolocation
Competitive Analysis
Customer Information
Multi-channel Analysis
Outcome Data
8. CLICKSTREAM COLLECTION
• Web Log Files
• Rudimentary data collected on company’s web server
• Page name, IP address, browser, date/time
• Does not screen out search engine robots
• JavaScript Tagging (Google
Analytics, Omniture, WebTrends)
•
•
•
•
As page loads, data is sent to 3rd party for collection
Assigns a cookie to the user
Can implement custom tags on specific pages
Does not count pages served from cache
• Packet Sniffers (Cloudmeter Pion, Tealeaf CX Connect)
• Software or hardware layer installed on web servers
• Parsing raw data, and ensuring PII can be complex
9. CLICKSTREAM ANALYSIS
• Number of Visitors
• Total vs. Unique
• New vs. Repeat
• Source of Visit (Session)
• External Link (Campaign Analysis / Attribution)
• Direct
• Searches Performed On Site
• Keywords
• Sort Order of Results
• Page Analysis
• Specific Actions Performed
• Order (Booking)
• Signup for Membership, Credit Card, Event
• Abandonment (Bounce Rate)
10. BRINGING CLICKSTREAM IN-HOUSE
• Control/Consolidate Business Rules
• Integration with Corporate Systems of Record
• Single Version of the Truth
• Integration with Other Web Data Sources
• Enable more “intelligent” metrics
• Not all visits are a conversion opportunity
• Shift from “visit analysis” to “customer analysis”
• Enable advanced statistical and predictive
modeling
• Multi-touch Attribution
• Pay Per Click (PPC) Keyword Bid Optimization
11. CLICKSTREAM CHALLENGES
• “Clickstream data … is delightfully complex, ever
changing, and full of mysterious occurrences.”
Avinash Kaushik, Web Analytics: An Hour a Day
• Volume
• Cons- It’s big
• Pros- It’s incremental
•
•
•
•
•
•
•
Fairly Unstructured
Exceptions to every rule
Mobile App vs. Mobile Web vs. Desktop
Rapidly Changing
Most queries require trending YTD + 2 years’ history
Few “natural” metrics; most require count (distinct)
How do I model this data??
12. DATA WAREHOUSE APPROACHES
Bill Inmon
Ralph Kimball
• DW is Central
Repository of all
Enterprise Data
• “Top Down”
• Relational Model (3NF)
• Feeds Functional Data
Marts
• Huge Undertaking
• DW is the “Virtual”
Integration of Various
Functional Data Marts
• “Bottom Up”
• Dimensional Model
• Quicker to Develop
• Silo-ed and Redundant
15. NORMALIZATION
• Removes redundancy and dependency from data
structures.
• 1NF: Remove Repeating Groups
• 2NF: Remove Partial Key Dependencies
• 3NF: Remove Dependencies Among Attributes
• Tutorial: http://phlonx.com/resources/nf3/
• Data Warehouses require some De-Normalization to
improve query performance
17. NATIVE SOURCE MODEL
Plus
• In-database copy of
the source data
• Stores data elements
we are not yet ready to
model further
• Maintains details for
research purposes
• Prevents repeating
historical conversion
Minus
•
•
•
•
Huge
Unstructured
Not normalized (at all)
Not useful for analysis or
reporting
19. FACT MODEL
Plus
• “Snow-relational”
• Nearly Normalized
(optimized for load)
• Multiple Fact &
Extension Tables
(manage I/O)
• Granular (click row)
• Contains keys to
integrate with
enterprise data
Minus
• Complex load
including propagation
and look-back
• Use requires nonfiltered joins of massive
tables
• Difficult to use for
analysis, cannot be
used for reporting
21. BI MODEL
Plus
Minus
• “Star-flake” Model
• De-normalized (optimized
for query)
• Pre-joined
• Granular (click row)
• Integrated with enterprise
data at load time
• Useful for detailed
analysis
• Complex load process
• It’s still big!
• Corrections to Fact
Model data issues
require re-build or
complex conversion
processes
• Difficult to use for
reporting
23. AGGREGATE MODEL
Plus
• Star Schema (simple)
• De-normalized
(optimized for query)
• Aggregated
• Fast query
performance
• Great for predetermined reports
Minus
• Corrections to Fact
Model data issues and
embedded dimensions
require re-build
• Count distincts only
available for predetermined dimensions
• Limited use for analysis