As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers.
Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon.
Boost Fertility New Invention Ups Success Rates.pdf
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
1. Hadoop User Group London: Data Wrangling on Hadoop
September 8 2016
Olivier de Garrigues, EMEA Solutions Lead
2. Creating radical productivity
for people who analyze data.
JEFFREY HEER
Co-Founder & CXO
VISUALIZATION
JOE HELLERSTEIN
Co-Founder & CSO
BIG DATA
SEAN KANDEL
Co-Founder & CTO
HUMAN-COMPUTER INTERACTION
4. What is Data Wrangling?
4
QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
5. The Bridge Between Raw Data & Analysis
5
v
Ingestion Storage Processing
ANALYSIS & VISUALIZATION
LOBCLEANING ENRICHMENT DISTILLATIONSTRUCTURINGDISCOVERY
End-User Capabilities
IT
GOVERNANCE INTEGRATION AVAILABILTIYSCALABILITYSECURITY
Technical Capabilities
9. TRIFACTA
DATA WRANGLING WORKFLOW
Trifacta. Confidential & Proprietary.
Sample Scale Up
Refine
Sample
Results
Identify/Register Data
1.
Predictive Interaction
2
.
Consume
Schedulers
Monitor and Adjust
3
.
Schedule
Visualization & Analysis
Secure Access
10. Ingestion Processing Storage
ANALYSIS & CONSUMPTION
v
Discover Structure Clean Enrich Distill
LOB
IT
News
Topics
Time
Trades
Tickers
Date
$
eMails
Recipients
Topics
Phone Logs
Call Details
Recipients
Corporations
Company Relations
Individuals
Financial Services use case: Trader Fraud
11. Data Wrangling Benefits
➔ Empower the people who know the data best
➔ Accelerate time to value
➔ Lower business risk with more accurate data
➔ Unlock innovation using a wider variety of data