3 Things to Learn:
-How data is driving digital transformation to help businesses innovate rapidly
-How Choice Hotels (one of largest hoteliers) is using Cloudera Enterprise to gain meaningful insights that drive their business
-How Choice Hotels has transformed business through innovative use of Apache Hadoop, Cloudera Enterprise, and deployment in the cloud — from developing customer experiences to meeting IT compliance requirements
12. Traveler 360
• Problem
− Traveler data spans multiple channels and is most relevant in near-real-time –
faster than our traditional data warehouse can make it available.
• Solution
− Integrate data from multiple channels into a single ingestion pipeline.
− Use Lambda Architecture style speed and batch storage.
• Value
− Complete view of all the traveler’s information, regardless of system of origin.
− Near-real-time personalization of web, mobile, and point-of-sale interfaces.
13. Real-Time Franchisee Reporting
• Problem
− Tracking performance of our franchise properties can take days, leaving us in the
dark about rapidly evolving trends.
• Solution
− Ingest broad swaths of guest booking data into the data lake.
− Generate derivative datasets tailored to key business intelligence cases.
− Connect Tableau and other BI tools to the Data & Analytics Platform.
• Value
− Business analytics reports are available as soon as the data arrives.
− Evolving trends can be acted upon while they are fresh and hot.
14. Retiring Aging Systems
• Problem
− Aging systems – some more than a decade old – are being maintained for a few
remaining edge cases that do not fit in any new system.
• Solution
− Ingest legacy data into the Choice Data & Analytics Platform.
− Migrate old data pipelines onto DAP infrastructure.
− Reproduce existing views using modern tools like Spark and Impala.
• Value
− Cluster processing and high performance storage vastly improves performance.
− Legacy languages, libraries, and systems can finally be retired.
15. Swiss Army Knife
• PCI Compliance
• SOX Audits
• Historical Views
• BI Discovery
• Geo-Trends
• Glue
19. Platform Architecture – Data Processing Layer
• Storage layer carved into logical buckets
• Landing, Raw, Delivery, and Derived
• Schema Stored With Data
• Platform Jobs for
• Converting Text Batches to Parquet
• Streaming Data to Parquet
• Compaction
• Derived Tables & Views
• Standardization
20. Platform Architecture – Data Delivery Layer
• Data Delivery
• Impala SQL (Tableau, SQL IDE)
• SparkR, RStudio, Sparklyr
• Spark to Web API (JSON, XML)
• Spark to Export (PDF, Excel, CSV)
• Self Service Derived Views
• Metadata driven
• Spark Refresh
• Near-Real-Time or Periodic
• Access Via SQL in Impala
• Access Via DataFrames in Spark
IMPALA
21. Deploying with Cloudera
• AWS & On-Prem
• Ability to Test-drive Hadoop Components Easily
• Configuration at Our Fingertips
• Effortless Upgrades
• Strong Road-Map for The Future
22. Key Factors for Success
• Separate Compute From Storage
− Enables “Bring Your Own Compute”
− Makes It Easier to Migrate Components
− Spin Up a New Cluster While Keeping the Old One Live
• Start Small and Build on Success
• Remain Agile, Embrace Change
• Get Business Users Involved Early
• Develop The Team: Your People are The Most Important Tool
24. Bootstrapping Big Data
• Pilot Proof of Concept
− Demo the Technology
• Scale Up to One Business Case
− Traveler 360
• Build Out Additional Cases
− Business Intelligence, Audit
• Stay Flexible, Explore and Discover
− Thrift -> Impala -> SparkSQL
− Sqoop, StreamSets, Custom JDBC, File, & MQ
25. The People
• Big Data is Hot: Everyone Wants to Do It
− Get Great People, Develop Ownership and Pride
• Big Data is Fun: Encourage Your Team to Enjoy It!
− Morale is Critical on Rapidly Evolving Projects
• Big Data Can Be Learned
− Passion for Learning is a Must; Experience is a Nice-To-Have
• Ramp-Up New Engineers Quickly
− Cloud, Virtual Machines, and Cloudera Make Ramp-Up Fast!