This presentation is about leveraging Big Data environments including Hadoop, Spark and Storm to:
- Easily integrate disparate data sources and streams in real time to capture business events as they occur
- Leverage predictive analytics and machine learning across all your data to derive the right insight at the right time
- Build decision-centric systems that use this insight to act in real time, so you can capture new opportunities as they occur
2. 22
See this presentation on line
An Online version of this presentation is accessible at the
following URL.
• https://info.talend.com/en_bd_realtime_analytics_oneclick.html
3. 33
Your Speakers Today
Jean-Michel Franco
Product Marketing Director – Data Governance Products
Mark Balkenende
Manager, Technical Product Marketing
4. 44
Connecting the Data-Driven Enterprise
Data-Driven companies…
• 23 times greater customer acquisition
• 6 times greater customer retention
• 19 times more profitability
7. 77
But will this finally meet the promises
of analytics ?
In most companies, fewer than 10% employees have access to BI and analytic systems.
8. 88
Of course you can leverage data discovery,
dataviz and predictive analysis
9. 99
Source: September 20, 2011, “Understanding The Business Intelligence Growth Opportunity” Forrester report
But the scope and reach of Analytics has expanded
NOW
11. 1111
BI as we believe it should goThe three new dimensions of analytics
Build an agile and manageable
data integration layer
From dashboard to analytical application
Predictive analytics and machine learning
Embed analytics in your operational
processes
Provisioning
the data
Designing the
System of
insights
Operationalize
Your analytics
BigData
integration
BigData
Analytics
DataInte-
gration&
preparation
12. 1212
Build an agile and manageable integration layer
Data
Inventory
Data
Prepa-
ration
Master
Data
Mgmt.
Data
Integra-
tion
Create your data
catalog.
Profile the Data.
Augment and connect.
Productize the
Data flows
Sanction the Data.
Share and monitor.
13. 1313
Big data and Open source is opening new horizons for data scientists
Designing the system of insights
• Data scientist role is finally recognized as a must to success in analytics
• Democratization of Analytics/machine learning technologies
- Open source tools : Rapid Miner, Knime, R …
- Cloud based machine learning platforms : Google Prediction API, Azure ML, Amazon
ML…
- Larger range of options of high end solutions: Blue Yonder, Watson, SAS, BigML…
• Better options to operationalizing analytics, rather than use it mostly on an
ad-hoc basis
- Run the model in place and schema on-read, where the Big Data is with Hadoop
- Robust options for deploying models are now emerging (Mahout, Spark ML)
14. 1414
Operationalize your analytics
Enterprise Apps
Market Data
Sensors
Logs
Digital applications
Data Integration
Real time Data
& application
integration
Data warehouse
& marts
Ad hoc analysis
& mining
Repoting
Data
Lake
Data profiling
& preparation
Data
Discovery
Data
modeling
TheData
Lab
TheData
Factory
Data
Hub
Data
flows
Predictions
& prescriptions
Embedded
analytics
15. 1515
Easiest and Most Powerful Integration Solution for Big Data
Introducing Talend Big Data
17. 1717
Simplify Real-Time Big Data
100x
performance increase
< 1 sec
response
Address new use cases
(last minute defense, dynamic pricing, real-time fraud
detection, CEP, etc.)
New components for streaming data
18. 1818
Spark integration in Talend Studio
Apache
• Technical Preview
• Machine learning components
require a Talend Big Data Platform
license
• Implementation of Spark, ML LIB
and Spark Streaming API
• 17 Components for data
integration
- Data integration : Load, Connection,
Sample, FilterRow, FilterColumns,
Normalize, Union, Replicate,
Aggregate, Sort, Join, Uniq, Log,
Store
- Machine learning and Data Quality:
Sample, ALS Model, Recommend
"Don't assume you can easily port existing applications to Spark from another
data-processing model, like MapReduce. Moving to Spark means a complete
reimplementation, and the potential benefits must outweigh that cost. "
Nick Heudecker - Gartner
19. 1919
Otto Optimizes Pricing & Stock
A company that’s doing everything right
Challenge:
• Ever increasing Big Data velocity
• Many last minute cart abandonments
• Hard to optimize pricing
Why Talend:
• Is the central integration tool within their Business
Intelligence (BI) organization.
• Integrates clickstreams from last 6 months
Value:
• Leftover merchandise reduced by 20%
• Can predict abandoned shopping cart in real-time with a 90%
accuracy
• Performs dynamic pricing
20. 2020
Demonstration
Key capabilities
• Drives the learning process by integrating data in
Hadoop and launch the MLlib learning process
• Drives the recommendation process by ingesting
demographics data into the engine, and integrating
the output into any application or data target.
Business Benefits
• Hides the underlying complexity of Hadoop and
Spark
• Easily embed machine learning into any
application or data target
• Machine learning with precision and at scale
• Predictive analysis for the rest of us
Demographics data Big Data
tSparkALSModel
tSparkRecommend
Test
Run
Training data
21. 2121
Start now with the Talend Big Data Sandbox
Virtual Image installed with
• Multiple scenarios for you to try:
- Clickstream data
- Twitter sentiment
- Apache weblogs
- ETL Offload
- Recommendations
through Spark
Machine Learning
Download your Free Talend Big Data Sandbox today!
http://www.talend.com/talend-big-data-sandbox