SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Advanced data science algorithms
applied to scalable stream processing
David Piris Valenzuela
Nacho García Fernández
Ignacio.g.Fernandez@treelogic.com
@0xNacho
david.piris@treelogic.com
@davidpiris
3
About Treelogic
 R&D intensive company with the mission of adapting technological knowledge to
improve quality standards in our daily life
 8 ongoing H2020 projects (coordinating 3 of them)
 8 ongoing FP7 projects (coordinating 5 of them)
 Focused on providing Big Data Analytics in all the world
 Internal organization
Research lines
 Big Data
 Computer vision
 Data science
 Social Media Analysis
 Security
ICT solutions
 Security & Safety
 Justice
 Health
 Transport
 Financial Services
 ICT tailored solutions
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
6
Why we need Big Data
7
Why we need Big Data
 Public and private sector companies store a huge mount of data
 Countries with huge databases store data from
 Population
 Medical records
 Taxes
 Online transactions
 Mobile transactions
 Social Networks
In a single day, tweets generates 12 TB!!
8
Why we need Big Data
2.5 Exabytes are produced every day!!!
 530.000.000 million songs
 150.000.000 iPhones
 5 million laptops
 90 years of HD Video
9
Why we need Big Data
How can we manage all data?
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
11
Big Data: Solutions
First we can manage all historical repository, and retrieve some value from
data stored
 Batch architecture
 MapReduce
 Hadoop Ecosystem
12
Big Data: Solutions
13
Big Data: Solutions
Batch processing with Hadoop takes a lot of time and the need to process
ingested data and display results in a shortest way possible brings new
architecture and tools
 Lambda architecture
 Spark (memory vs disk)
14
Big Data: Solutions
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
16
Big data: real-time processing
 Faster results
 Accurate results
 Less expense
 Please consumers
17
Big data: real-time processing
As previously said, we need to extract and visualize information in near real
time…
18
Big data: real-time processing
 Flink as engine process
 Stream processing
 Windowing with events time semantics
 Streaming and batch processing
19
Big data: real-time processing
Kappa architecture
 Batch layer removed
 Only one set of code needs to be maintained
20
Big data: real-time processing
 No need to use batch layer
 Avoid use disk in engine process (latency)
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
22
Big data: available tools
23
Incremental algorithms
 BI & BA people always want to made some common operations to retrieve
value and visualize data
 We have operational tools in a relational or batch environment
 How we can obtain average for a data stream that is changing every
second, minutes or even milliseconds…?
 Common average operation is indicated for historical repository, data input
without any changes in the moment we start the process to obtain it.
 Do we have tools to make it possible in a real time deployment?
24
Incremental algorithms
Answer is NO!
25
Incremental algorithms
Flink gives us the chance to operate with a new window processing concept.
We can decide and configure "small time pieces", and make some
operations or manipulate data in that time space.
26
Incremental algorithms
With Flink and windowing…
27
Incremental algorithms
 These algorithms consume streams of data and are able to update their
results in a parallel manner without the need of saving the processed data
 Using checkpoints in windowing, allows us to store result from previous
window process
28
Incremental algorithms
Our analytics & visualization solution implemented in a real time architecture
29
Incremental algorithms
If you are a BI or BA professional...we care about you!
30
Incremental algorithms
 Currently, we have implemented:
 Average
 Mode
 Variance
 Correlation
 Covariance
 Min
 Max
31
Incremental algorithms
 Currently we are working on:
 Median
32
Incremental algorithms
 In roadmap…
 Standard deviation
 Order by
 Discretization
 Contains
 Split
 Validate range values
 Set default value to specific output
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
34
Apache Flink vs Apache Spark
 Pure streams for all workloads
 Optimizer
 Low latency, high throughput
 Global, session, time and count based
window criteria
 Provides automatic memory management
 Micro-batches for all workloads
 No job optimizer
 High latency as compared to Flink
 Time-based window criteria
 Configurable memory management. Spark
1.6+ has move towards automating
memory management
35
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
37
Incremental algorithms in Flink
38
Incremental algorithms in Flink
 Default behavior in Apache Flink:
 With incremental algorithms:
39
Incremental algorithms in Flink
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
41
Apache Kudu
 Provides a combination of fast inserts / updates and efficient columnar
scans to enable real-time analytic workloads
 It is a new complements to HDFS and HBase
 Designed for use cases that require fast analytics on fast data
 Low query latency
 V1.0.1 was released on October 11, 2016
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
43
PROTEUS: a steel making scenario
 Steel industry is a key sector for the European community.
 PROTEUS was introduced last year at Big Data Spain by Treelogic *
 Hot Strip mills (sometimes) produces steel with defects
 Predict coil parameters (thickness, width, flatness) using real-time and historical data
 Detecting defective coils in an early stage saves money. The production process can be
modified / stopped.
 Proposed architecture is being validated in this project
 7870 variables with a frequency of 500ms: data-in-motion
 700.000 registers for each variables. 500GB time series and flatness map: data-at-rest
* https://www.youtube.com/watch?v=EIH7HLyqhfE
44
PROTEUS: a steel-making scenario
 Steel industry is a key sector for the European community.
 PROTEUS was introduced last year at Big Data Spain by Treelogic *
 Hot Strip mills (sometimes) produces steel with defects
 Predict coil parameters (thickness, width, flatness) using real-time and historical data
 Detecting defective coils in an early stage saves money. The production process can be
modified / stopped.
 Proposed architecture is being validated in this project
 7870 variables with a frequency of 500ms: data-in-motion
 700.000 registers for each variables. 500GB time series and flatness map: data-at-rest
* https://www.youtube.com/watch?v=EIH7HLyqhfE
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
46
Websockets
 Websocket is a computer communication protocol providing full-duplex
communication channels over a single TCP connection.
 Extremely faster than HTTP
 Its API is standardized by the W3C
47
Apache Flink & Websockets
 Data sinks consume DataSets and are used to store or return them.
 Flink comes with a variety of built-in output formats that are encapsulated behind
operations on the DataSet:
 writeAsText()
 writeAsFormattedText()
 writeAsCsv()
 print()
 write()
 We’ve developed a WebsocketSink enabling Flink to send outputs to a given
websocket endpoint.
 Based on the javax-websocket-client-api 1.1 spec.
48
Incremental architecture: our approach
49
50
ProteicJS
https://github.com/proteus-h2020/proteic/
51
ProteicJS: Visualizations
52
ProteicJS: Researching on visualization
 Currently researching on new ways of visualizing data and ML models
53
ProteicJS & Apache Flink
54
How to get it all
https://github.com/proteus-h2020/proteus-docker
Advanced data science algorithms
applied to scalable stream processing
David Piris Valenzuela
Nacho García Fernández
Ignacio.g.Fernandez@treelogic.com
@0xNacho
david.piris@treelogic.com
@davidpiris

Mais conteúdo relacionado

Mais procurados

The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Big Data Spain
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleItai Yaffe
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control TowerDatabricks
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Robert Sanders
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeAli Hodroj
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 

Mais procurados (20)

The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 

Destaque

Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos IzquierdoBig Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos IzquierdoBig Data Spain
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyBig Data Spain
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoBig Data Spain
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Big Data Spain
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseBig Data Spain
 
Big data in 140 characters by Joe Rice
Big data in 140 characters by Joe RiceBig data in 140 characters by Joe Rice
Big data in 140 characters by Joe RiceBig Data Spain
 
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
From data to AI with the Machine Learning Canvas by Louis  Dorard SlidesFrom data to AI with the Machine Learning Canvas by Louis  Dorard Slides
From data to AI with the Machine Learning Canvas by Louis Dorard SlidesBig Data Spain
 
Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...Big Data Spain
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoBig Data Spain
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniBig Data Spain
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 

Destaque (11)

Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos IzquierdoBig Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos Izquierdo
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén Casado
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
 
Big data in 140 characters by Joe Rice
Big data in 140 characters by Joe RiceBig data in 140 characters by Joe Rice
Big data in 140 characters by Joe Rice
 
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
From data to AI with the Machine Learning Canvas by Louis  Dorard SlidesFrom data to AI with the Machine Learning Canvas by Louis  Dorard Slides
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
 
Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo Molini
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 

Semelhante a Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García

Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Edwin Poot
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8Navaid Khan
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Navaid Khan
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data PlatformNavaid Khan
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoopgluent.
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the CloudMapR Technologies
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopRemas Ittahir
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...COIICV
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Matt Stubbs
 

Semelhante a Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García (20)

Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data Platform
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
 

Mais de Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

Mais de Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Último

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Último (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García

  • 1.
  • 2. Advanced data science algorithms applied to scalable stream processing David Piris Valenzuela Nacho García Fernández Ignacio.g.Fernandez@treelogic.com @0xNacho david.piris@treelogic.com @davidpiris
  • 3. 3 About Treelogic  R&D intensive company with the mission of adapting technological knowledge to improve quality standards in our daily life  8 ongoing H2020 projects (coordinating 3 of them)  8 ongoing FP7 projects (coordinating 5 of them)  Focused on providing Big Data Analytics in all the world  Internal organization Research lines  Big Data  Computer vision  Data science  Social Media Analysis  Security ICT solutions  Security & Safety  Justice  Health  Transport  Financial Services  ICT tailored solutions
  • 4. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 5. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 6. 6 Why we need Big Data
  • 7. 7 Why we need Big Data  Public and private sector companies store a huge mount of data  Countries with huge databases store data from  Population  Medical records  Taxes  Online transactions  Mobile transactions  Social Networks In a single day, tweets generates 12 TB!!
  • 8. 8 Why we need Big Data 2.5 Exabytes are produced every day!!!  530.000.000 million songs  150.000.000 iPhones  5 million laptops  90 years of HD Video
  • 9. 9 Why we need Big Data How can we manage all data?
  • 10. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 11. 11 Big Data: Solutions First we can manage all historical repository, and retrieve some value from data stored  Batch architecture  MapReduce  Hadoop Ecosystem
  • 13. 13 Big Data: Solutions Batch processing with Hadoop takes a lot of time and the need to process ingested data and display results in a shortest way possible brings new architecture and tools  Lambda architecture  Spark (memory vs disk)
  • 15. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 16. 16 Big data: real-time processing  Faster results  Accurate results  Less expense  Please consumers
  • 17. 17 Big data: real-time processing As previously said, we need to extract and visualize information in near real time…
  • 18. 18 Big data: real-time processing  Flink as engine process  Stream processing  Windowing with events time semantics  Streaming and batch processing
  • 19. 19 Big data: real-time processing Kappa architecture  Batch layer removed  Only one set of code needs to be maintained
  • 20. 20 Big data: real-time processing  No need to use batch layer  Avoid use disk in engine process (latency)
  • 21. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 23. 23 Incremental algorithms  BI & BA people always want to made some common operations to retrieve value and visualize data  We have operational tools in a relational or batch environment  How we can obtain average for a data stream that is changing every second, minutes or even milliseconds…?  Common average operation is indicated for historical repository, data input without any changes in the moment we start the process to obtain it.  Do we have tools to make it possible in a real time deployment?
  • 25. 25 Incremental algorithms Flink gives us the chance to operate with a new window processing concept. We can decide and configure "small time pieces", and make some operations or manipulate data in that time space.
  • 27. 27 Incremental algorithms  These algorithms consume streams of data and are able to update their results in a parallel manner without the need of saving the processed data  Using checkpoints in windowing, allows us to store result from previous window process
  • 28. 28 Incremental algorithms Our analytics & visualization solution implemented in a real time architecture
  • 29. 29 Incremental algorithms If you are a BI or BA professional...we care about you!
  • 30. 30 Incremental algorithms  Currently, we have implemented:  Average  Mode  Variance  Correlation  Covariance  Min  Max
  • 31. 31 Incremental algorithms  Currently we are working on:  Median
  • 32. 32 Incremental algorithms  In roadmap…  Standard deviation  Order by  Discretization  Contains  Split  Validate range values  Set default value to specific output
  • 33. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 34. 34 Apache Flink vs Apache Spark  Pure streams for all workloads  Optimizer  Low latency, high throughput  Global, session, time and count based window criteria  Provides automatic memory management  Micro-batches for all workloads  No job optimizer  High latency as compared to Flink  Time-based window criteria  Configurable memory management. Spark 1.6+ has move towards automating memory management
  • 35. 35
  • 36. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 38. 38 Incremental algorithms in Flink  Default behavior in Apache Flink:  With incremental algorithms:
  • 40. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 41. 41 Apache Kudu  Provides a combination of fast inserts / updates and efficient columnar scans to enable real-time analytic workloads  It is a new complements to HDFS and HBase  Designed for use cases that require fast analytics on fast data  Low query latency  V1.0.1 was released on October 11, 2016
  • 42. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 43. 43 PROTEUS: a steel making scenario  Steel industry is a key sector for the European community.  PROTEUS was introduced last year at Big Data Spain by Treelogic *  Hot Strip mills (sometimes) produces steel with defects  Predict coil parameters (thickness, width, flatness) using real-time and historical data  Detecting defective coils in an early stage saves money. The production process can be modified / stopped.  Proposed architecture is being validated in this project  7870 variables with a frequency of 500ms: data-in-motion  700.000 registers for each variables. 500GB time series and flatness map: data-at-rest * https://www.youtube.com/watch?v=EIH7HLyqhfE
  • 44. 44 PROTEUS: a steel-making scenario  Steel industry is a key sector for the European community.  PROTEUS was introduced last year at Big Data Spain by Treelogic *  Hot Strip mills (sometimes) produces steel with defects  Predict coil parameters (thickness, width, flatness) using real-time and historical data  Detecting defective coils in an early stage saves money. The production process can be modified / stopped.  Proposed architecture is being validated in this project  7870 variables with a frequency of 500ms: data-in-motion  700.000 registers for each variables. 500GB time series and flatness map: data-at-rest * https://www.youtube.com/watch?v=EIH7HLyqhfE
  • 45. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 46. 46 Websockets  Websocket is a computer communication protocol providing full-duplex communication channels over a single TCP connection.  Extremely faster than HTTP  Its API is standardized by the W3C
  • 47. 47 Apache Flink & Websockets  Data sinks consume DataSets and are used to store or return them.  Flink comes with a variety of built-in output formats that are encapsulated behind operations on the DataSet:  writeAsText()  writeAsFormattedText()  writeAsCsv()  print()  write()  We’ve developed a WebsocketSink enabling Flink to send outputs to a given websocket endpoint.  Based on the javax-websocket-client-api 1.1 spec.
  • 49. 49
  • 52. 52 ProteicJS: Researching on visualization  Currently researching on new ways of visualizing data and ML models
  • 54. 54 How to get it all https://github.com/proteus-h2020/proteus-docker
  • 55. Advanced data science algorithms applied to scalable stream processing David Piris Valenzuela Nacho García Fernández Ignacio.g.Fernandez@treelogic.com @0xNacho david.piris@treelogic.com @davidpiris