The document provides an overview of Apache Hadoop and how it addresses challenges with traditional data architectures. It discusses how Hadoop uses HDFS for distributed storage and YARN as a data operating system to allow for distributed computing. It also summarizes different data access methods in Hadoop including MapReduce for batch processing and how the Hadoop ecosystem continues to evolve and include technologies like Spark, Hive and more.
69. Data Discovery Lab
• Elefante Wine Company has a fleet of over 100 trucks.
• The geolocation data collected from the trucks contains events generated while the truck drivers are
driving.
• The company’s goal with Hadoop is to Mitigate Risk:
o Understand correlations between miles driven and events
o Compute the risk factor for each driver based on mileage & events
o Lab Env
o Sandbox 2.4
o Lab Doc
o URL: http://goo.gl/14OAat
o Load Data
o Query Data
o Process Data
70. Elefante Wine Current Challenges
The Company
Elefante Wine is a boutique wine fulfillment company with a large fleet of trucks. It delivers wine
in a highly-regulated industry with stringent transportation requirements.
The Situation
Recently a number of driver violations led to fines and increased insurance rates
The Challenges
• Rising Operational Costs
• Driver Safety
• Risk Management
• Logistics Optimization
72. Elefante Wine Risk and Driver Safety Challenges
Trucks outfitted with new sensors generating large
volumes of new data:
• Location
• Speed
• Driver Violations
Need to be integrate real-time & historical data
Increase safety and reduce liabilities
Anticipate driver violations BEFORE they
happen and take precautionary actions
Find predictive correlations in driver behavior over
large volumes of real-time data
Difficult to deliver timely insights to the right
people and systems to take action
Data Discovery
Uncover new
findings
Predictive Analytics
Identify your next best
action
Better Understanding
of the Past
Better Prediction
of the Future