4. We do Hadoop
The leaders of Hadoop’s
development
Community driven,
Enterprise Focused
Drive Innovation in the
platform – We lead
the roadmap
100% Open Source –
Democratized Access to
Data
5. We do Hadoop successfully.
Support
Professional Services
Training
18. The solution?
EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Yet Another EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Analytical DB
Data
Data
Data
Data
Data
Data
Data Data
Data OLTP
Data
Data
Data
Data
Data
Data
Data Data
Data
Another EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
19. Ummm…you
dropped something
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
DataData
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
DataData
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Yet Another EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Analytical DB
Data
Data
Data
Data
Data
Data
Data Data
Data
OLTP
DataData
Data
Data
Data
Data
Data Data
Data
Another EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
22. Wait, you’ve seen this before.
DataData
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
DataData
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Analytics Sausage Factory
Data Data
Data
Data
Data
Data
Data Data
Data …Data
Data
Data … Data
Data
Data
Data
25. “Prices, Stupid passwords, and
Boring Statistics.”
- Hans Rosling
http://www.youtube.com/watch?v=hVimVzgtD6w
26. Your data silos are lonely places.
EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Accounts
Data
Data
Data
Data
Data
Data
Data Data
Data
Customers
Data
Data
Data
Data
Data
Data
Data Data
Data
Web Properties
Data
Data
Data
Data
Data
Data
Data Data
Data
27. … Data likes to be together.
EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Accounts
Data
Data
Data
Data
Data
Data
Data Data
Data
Customers
Data
Data
Data
Data
Data
Data
Data Data
Data
Web Properties
Data
Data
Data
Data
Data
Data
Data Data
Data
28. Data likes to socialize too.
EDW
Data
Data
Data
Data
Data
Data
Data Data
Data
Accounts
Data
Data
Data
Data
Data
Data
Data Data
Data
Customers
Data
Data
Data
Data
Data
Data
Data Data
Data
Web Properties
Data
Data
Data
Data
Data
Data
Data Data
Data
Machine Data
Data
Data
Data
Data
Data
Data
Data Data
Data
Twitter
DataData
Data
Data
Data
Data
Data Data
Data
Facebook
Data
Data
Data
Data
Data
Data
Data Data
Data
CDR
Data
Data
Data
Data
Data
Data
Data Data
Data
Weather Data
Data
Data
Data
Data
Data
Data
Data Data
Data
29. New types of data don’t quite fit into
your pristine view of the world.
My Little Data Empire
Data
Data
Data
Data
Data
Data
Data Data
Data
Logs
Data
DataData
Data
Data
Data
Data
Machine Data
Data
DataData
Data
Data
Data
Data
?
?
?
?
30. To resolve this, some people take
hints from Lord Of The Rings...
32. …but that has its problems too.
EDW
Data
Data
Data
Data
Data
Data
Data Data
DataSchemaData
Data
Data
ETL ETL
ETL ETL
EDW
Data
Data
Data
Data
Data
Data
Data Data
DataSchemaData
Data
Data
ETL ETL
ETL ETL
33. What if the data was processed and
stored centrally? What if you didn’t
need to force it into a single
schema?
We call it a Data Lake.
EDW
Data
Data
Data
Data
Data
Data
Data
Schema
BI & Analytics
Schema Schema
Data
Data
Data
Data Lake
Data
Data
Data
Data
Data
DataData
Data
Data
Data
Data
Data
Schema
Schema
Data
Data
Data
Process Process
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
DataData Sources
Data Sources
34. A Data Lake Architecture enables:
- Landing data without forcing a single schema
- Landing a variety and large volume of data
efficiently
- Retaining data for a long period of time with a very
low $/TB
- A platform to feed other Analytical DBs
- A platform to execute next gen data analytics and
processing applications (SAS, Informatica,
Graph Analytics, Machine Learning, SAP,
etc…)
35. In most cases, more data is better.
Work with the population, not just a
sample.
36. Your view of a client today.
Male
Female
Age: 25-30
Town/City
Middle Income Band
Product Category
Preferences
37. Your view with more data.
Male
Female
Age: 27 but
feels old
GPS coordinates
$65-68k per year
Product
recommendations
Tea Party
Hippie
Looking to start a
business
Walking into
Starbucks right now…
A depressed Toronto
Maple Leaf’s Fan
Products left in
basket indicate drunk
amazon shopper
Gene
Expression for
Risk Taker
Thinking about
a new house
Unhappy with his cell
phone plan
Pregnant
Spent 25 minutes
looking at tea cozies
44. If you could design a system that
would handle this, what would it
look like?
45. It would probably need a highly
resilient, self-healing, cost-efficient,
distributed file system…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
46. It would probably need a completely
parallel processing framework that
took tasks to the data…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
47. It would probably run on commodity
hardware, virtualized machines, and
common OS platforms
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
48. It would probably be open source so
innovation could happen as quickly
as possible
55. The Sandbox is ‘Hadoop in a Can’.
It contains one copy of each of the
Master and Worker node processes
used in a cluster, only in a single
virtual node.
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
Processing
Storage
Linux VM
56. Getting started with Sandbox VM:
- Pick your flavor of VM at…
http://www.hortonworks.com/sandbox
- Start the sandbox VM
- find the IP displayed
- go to…
http://172.16.130.137
- Register
- Click on ‘Start Tutorials’
- On the left hand nav, click on ‘HCatalog, Basic Pig
& Hive Commands’