What's next for Big Data? -- Apache Spark

•

22 gostaram•4,040 visualizações

TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London. TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading). With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second. TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting. To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com

Dados e análise Tecnologia Educação

3 TUMRA - Big Data Week, May 2014

Spark is …
“One platform to rule them all”

… and blurs boundary between SQL,
machine learning, streams & graphs

4 TUMRA - Big Data Week, May 2014

Spark is …
… gaining momentum

5 TUMRA - Big Data Week, May 2014

Spark has …
… more contributors than Hadoop

6 TUMRA - Big Data Week, May 2014

Spark can …

Source:
Databricks

7 TUMRA - Big Data Week, May 2014

Spark Stack

Source:
Databricks

Hadoop
(HDFS)

8 TUMRA - Big Data Week, May 2014

Why Spark?
-  Code reuse across batch, streaming
and interactive applications
-  Easy API from Scala, Java & Python
-  In-memory data sharing
FAAAAAAST!!!
Check out http://spark.apache.org

9 TUMRA - Big Data Week, May 2014
CASE STUDY:
PERSONALISATION &
MARKETING
AUTOMATION

10 TUMRA - Big Data Week, May 2014
Our history with Spark
-  Early adopters; poc in Dec ‘12
-  In production since March ‘13
-  Running on Amazon EC2
-  Ad-hoc analysis and reporting
-  Machine learning model building
-  Integrates to our real-time dashboards

11 TUMRA - Big Data Week, May 2014
Use Case: Personalisation

12 TUMRA - Big Data Week, May 2014
Use Case: Personalisation (cont’d)
-  Matching visitors to products
-  50% of visitors are ‘new’ and have
no history to work with
-  Blend of pre-computation and real-
time recommendations

13 TUMRA - Big Data Week, May 2014
Use Case: Marketing Automation
-  Collect user engagement data
across websites and mobile apps
-  Increase subscription rates
-  Identity users at risk of churn
-  Automated personalised marketing

14 TUMRA - Big Data Week, May 2014
Data Volumes & Velocity
-  29M events per day
-  Peak rates ~800 events / second
-  All events streamed to Kafka
-  10B archived events in Amazon S3

15 TUMRA - Big Data Week, May 2014
How we use Spark

Amazon
S3
(HDFS
interface)
Apache
Ka>a

Data
CollecAon
API
(Akka)
&
Connectors

16 TUMRA - Big Data Week, May 2014
Spark gives us …
-  Unified platform for machine
learning and graph analytics
-  Ability to experiment at huge scale
-  SQL interfaces to existing tools
-  Code reuse from data scientists to
production workloads

17 TUMRA - Big Data Week, May 2014
WANT TO
KNOW MORE?

18 TUMRA - Big Data Week, May 2014
http://spark.apache.org

19 TUMRA - Big Data Week, May 2014
Spark Summit 2014

20 TUMRA - Big Data Week, May 2014
Spark London Meetup

21 TUMRA - Big Data Week, May 2014
Commercial Support & Certiﬁcation

22 TUMRA - Big Data Week, May 2014
THANK
YOU

@tumra
tumra.com

slideshare.net/tumra

Mais conteúdo relacionado

Mais procurados

Open Source DataViz with Apache SupersetCarl W. Handlin

Hadoop world overview trends and topicsValentin Kropov

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks

963Annu Ahmed

An efficient data mining solution by integrating Spark and CassandraStratio

Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit

Hunk - Unlocking the Power of Big DataSplunk

Clickstream & Social Media Analysis using Apache SparkTUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Qubole - Big data in cloudDmitry Tolpeko

Atlanta Data Science Meetup | Qubole slidesQubole

Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques

Building Data Pipelines with Spark and StreamSetsPat Patterson

Treasure Data From MySQL to RedshiftTreasure Data, Inc.

Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen

Big Data Ecosystem - 1000 Simulated DronesEspeo Software

December 2013 HUG: Hunk - Splunk over HadoopYahoo Developer Network

Hunk - Unlocking The Power of Big Data Breakout SessionSplunk

Building a Big Data PipelineJesus Rodriguez

Mais procurados (20)

Open Source DataViz with Apache Superset

Hadoop world overview trends and topics

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...

963

An efficient data mining solution by integrating Spark and Cassandra

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa

Hunk - Unlocking the Power of Big Data

Clickstream & Social Media Analysis using Apache Spark

Qubole - Big data in cloud

Atlanta Data Science Meetup | Qubole slides

Presentation Brucon - Anubisnetworks and PTCoresec

Building Data Pipelines with Spark and StreamSets

Treasure Data From MySQL to Redshift

Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...

Big Data Ecosystem - 1000 Simulated Drones

December 2013 HUG: Hunk - Splunk over Hadoop

Hunk - Unlocking The Power of Big Data Breakout Session

Building a Big Data Pipeline

Destaque

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...MongoDB

Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYCMLconf

Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB

Cassandra UDF and Materialized ViewsDuyhai Doan

20140908 spark sql & catalystTakuya UESHIN

11 Shocking Stats That Will Transform Your Marketing Strategy Sailthru

Acquire, Grow & Retain Customers, FastSailthru

Building a Recommendation Engine Using Diverse Features by Divyanshu VatsSpark Summit

Predictive Analytics and Machine Learning…with SAS and Apache HadoopHortonworks

The Best of the Best: Media and Publishing Newsletter EditionSailthru

2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and WhySailthru

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco

Overview - IBM Big Data PlatformVikas Manoria

Big Data & Analytics ArchitectureArvind Sathi

50 Facts That Will Make Businesses Rethink their Customer ServiceDesk

Introduction to Machine LearningLior Rokach

Introduction to Big Data/Machine LearningLars Marius Garshol

Cours de Génie Logiciel / ESIEA 2016-17Thierry Leriche-Dessirier

Management en couleur avec DISCThierry Leriche-Dessirier

Destaque (19)

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...

Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYC

Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles

Cassandra UDF and Materialized Views

20140908 spark sql & catalyst

11 Shocking Stats That Will Transform Your Marketing Strategy

Acquire, Grow & Retain Customers, Fast

Building a Recommendation Engine Using Diverse Features by Divyanshu Vats

Predictive Analytics and Machine Learning…with SAS and Apache Hadoop

The Best of the Best: Media and Publishing Newsletter Edition

2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and Why

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...

Overview - IBM Big Data Platform

Big Data & Analytics Architecture

50 Facts That Will Make Businesses Rethink their Customer Service

Introduction to Machine Learning

Introduction to Big Data/Machine Learning

Cours de Génie Logiciel / ESIEA 2016-17

Management en couleur avec DISC

Semelhante a What's next for Big Data? -- Apache Spark

JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky

INFO491FinalPaperJessica Morris

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014Andreas Drakos

Started with-apache-sparkHappiest Minds Technologies

Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi

Strata EU 2014: Spark Streaming Case StudiesPaco Nathan

Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan

End-to-End Data Pipelines with Apache SparkBurak Yavuz

Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan

Spark is going to replace Apache Hadoop! Know Why?Edureka!

CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann

Introduction to sparkHome

Hadoop to spark_v2elephantscale

Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit

Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi

Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi

Spark for big data analyticsEdureka!

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

IOT.pptMvidhya9

New directions for Apache Spark in 2015Databricks

Semelhante a What's next for Big Data? -- Apache Spark (20)

JEEConf 2015 - Introduction to real-time big data with Apache Spark

INFO491FinalPaper

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Started with-apache-spark

Hadoop or Spark: is it an either-or proposition? By Slim Baltagi

Strata EU 2014: Spark Streaming Case Studies

Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More

End-to-End Data Pipelines with Apache Spark

Databricks Meetup @ Los Angeles Apache Spark User Group

Spark is going to replace Apache Hadoop! Know Why?

CoC23_Utilizing Real-Time Transit Data for Travel Optimization

Introduction to spark

Hadoop to spark_v2

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks

Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

Spark for big data analytics

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

IOT.ppt

New directions for Apache Spark in 2015

Último

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

B2 Creative Industry Response Evaluation.docxStephen266013

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Edukaciniai dropshipping via API with DroFxolyaivanovalion

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Week-01-2.ppt BBB human Computer interactionfulawalesam

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Ukraine War presentation: KNOW THE BASICSAishani27

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Data-Analysis for Chicago Crime Data 2023ymrp368

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

What's next for Big Data? -- Apache Spark

1. WHAT’S NEXT FOR BIG DATA? APACHE SPARK

2. WTH IS SPARK?

3. 3 TUMRA - Big Data Week, May 2014 Spark is … “One platform to rule them all” … and blurs boundary between SQL, machine learning, streams & graphs

4. 4 TUMRA - Big Data Week, May 2014 Spark is … … gaining momentum

5. 5 TUMRA - Big Data Week, May 2014 Spark has … … more contributors than Hadoop

6. 6 TUMRA - Big Data Week, May 2014 Spark can … Source: Databricks

7. 7 TUMRA - Big Data Week, May 2014 Spark Stack Source: Databricks Hadoop (HDFS)

8. 8 TUMRA - Big Data Week, May 2014 Why Spark? -  Code reuse across batch, streaming and interactive applications -  Easy API from Scala, Java & Python -  In-memory data sharing FAAAAAAST!!! Check out http://spark.apache.org

9. 9 TUMRA - Big Data Week, May 2014 CASE STUDY: PERSONALISATION & MARKETING AUTOMATION

10. 10 TUMRA - Big Data Week, May 2014 Our history with Spark -  Early adopters; poc in Dec ‘12 -  In production since March ‘13 -  Running on Amazon EC2 -  Ad-hoc analysis and reporting -  Machine learning model building -  Integrates to our real-time dashboards

11. 11 TUMRA - Big Data Week, May 2014 Use Case: Personalisation

12. 12 TUMRA - Big Data Week, May 2014 Use Case: Personalisation (cont’d) -  Matching visitors to products -  50% of visitors are ‘new’ and have no history to work with -  Blend of pre-computation and real- time recommendations

13. 13 TUMRA - Big Data Week, May 2014 Use Case: Marketing Automation -  Collect user engagement data across websites and mobile apps -  Increase subscription rates -  Identity users at risk of churn -  Automated personalised marketing

14. 14 TUMRA - Big Data Week, May 2014 Data Volumes & Velocity -  29M events per day -  Peak rates ~800 events / second -  All events streamed to Kafka -  10B archived events in Amazon S3

15. 15 TUMRA - Big Data Week, May 2014 How we use Spark Amazon S3 (HDFS interface) Apache Ka>a Data CollecAon API (Akka) & Connectors

16. 16 TUMRA - Big Data Week, May 2014 Spark gives us … -  Unified platform for machine learning and graph analytics -  Ability to experiment at huge scale -  SQL interfaces to existing tools -  Code reuse from data scientists to production workloads

17. 17 TUMRA - Big Data Week, May 2014 WANT TO KNOW MORE?

18. 18 TUMRA - Big Data Week, May 2014 http://spark.apache.org

19. 19 TUMRA - Big Data Week, May 2014 Spark Summit 2014

20. 20 TUMRA - Big Data Week, May 2014 Spark London Meetup

21. 21 TUMRA - Big Data Week, May 2014 Commercial Support & Certiﬁcation

22. 22 TUMRA - Big Data Week, May 2014 THANK YOU @tumra tumra.com slideshare.net/tumra

What's next for Big Data? -- Apache Spark

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (19)

Semelhante a What's next for Big Data? -- Apache Spark

Semelhante a What's next for Big Data? -- Apache Spark (20)

Último

Último (20)

What's next for Big Data? -- Apache Spark