SlideShare uma empresa Scribd logo
1 de 43
SPARKTA
A real-time analytics platform
based on Apache Spark
London, May 2015
FIRST SPARK PLATFORM.
APR 2014
20+ INTERNATIONAL
PROJECTS
WITH SPARK
PLATFORM
OVERVIEW1
STRATIO
INGESTION
Customer lake
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO DEEP
STRATIO CROSSDATA
ODBC JBDC API Rest
CRM
ERP
Call
Center
BI
Internal
Data
External
Data
BI AD HOC APP
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
STRATIO DATAVIS
4
STRATIO
INGESTION
Customer lake
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO DEEP
STRATIO CROSSDATA
ODBC JBDC API Rest
CRM
ERP
Call
Center
BI
Internal
Data
External
data
BI AD HOC APP
Ingests,
transforms
Analyzes and
processes real
time streaming
A unified SQL interface
Machine Learning
and algorithms
Processes & combines with
Spark
STRATIO DATAVIS
Creates and designs
dashboards and reports
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
5
STRATIO
INGESTION
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes & processes
A unified SQL interface
Machine Learning
and algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLlib
Internal
Data
External
Data
BI AD HOC APP
Combines with Spark data from any
source
Customer lake
STRATIO DEEP
Processes & combines with Spark
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
STRATIO DATAVIS
Creates and designs
dashboards and reports
6
STRATIO
INGESTION
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes &
processes
Consult & analyze. SQL interface
Machine Learning
& algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLib
Internal
Data
External
Data
BI AD HOC APP
Data combination through time
Customer lake
STRATIO DEEP
Processes & combines with
Spark
Real-time
Ephemer
al tables
Past
Stored
tables
Future
Quantum
tables
STRATIO DATAVIS
Creates and designs
dashboards and reports
7
STRATIO DATAVIS
STRATIO
INGESTION
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes &
processes
Consulta y analiza. Interfaz SQL
Machine Learning
& algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLlib
Internal
Data
External
Data
Creates and designs
dashboards and reports
Customer lake
STRATIO DEEP
Processes & combines with Spark
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
INFORMATIONAL + OPERATIONAL
WITHOUT NEED TO REPLICATE DATA
Oracle, DB2
Other Databases Mongo DB TeradataOPERATIONAL
8
REAL-TIME:
Beyond cool dashboards2
The time is N W
We all know this story already
Social media and networking sites are a part of the fabric of
everyday life, changing the way the world shares and accesses
information.
The overwhelming amount of information gathered not only
from messages, updates and images but also readings from
sensors,GPS signals and many other sources was the origin of
a (big) technological revolution.
Remember? VOLUME, VARIETY & VELOCITY
CONFERENCE10
Look at these sexy infographics!
We all love data
visualization
Insights from this vast amount of data
allows us to learn from the users and
explore our own world.
We can follow in real-time the evolution
of a topic, an event or even an incident
just by exploring aggregated data.
CONFERENCE11
Delivering real-time business in the Internet
But beyond cool visualizations, there are
some core services delivered in real-time,
using aggregated data to answer common
questions in the fastest way.
These services are the heart of the
business behind their nice logos.
Site traffic, user engagement monitoring,
service health, APIs, internal monitoring
platforms, real-time dashboards…
Aggregated data feeds directly to end
users, publishers, and advertisers, among
others.
CONFERENCE12
Pushing business’ processes to perform faster
Digital companies, born to develop their services in real-time have changed
the expectations of many others businesses.
Real-time information makes it possible for a company to be much more agile
than its competitors, improving business answers, gaining insights on their
performance…
CONFERENCE13
Listen to your data…
CLIENTTPV
Accounts
Loans
and credits
Insurances
Broker
Mortgages
Cards
Deposits
ATM
Online
gateway
application logs
Social
networks
transactions
geolocation
CRM
Where as business intelligence is data gathered
for the purpose of analyzing trends over time,
operational intelligence provides a picture of
what is currently happening within a process.
And we can listen to almost everything! Orders,
transactions, clicks, calls, bookings, internal
services...
CONFERENCE14
…and start delivering real-time services
Real-time monitoring could be really nice, but your
company needs to work in the same way as digital
companies:
• Rethinking existing processes to deliver them
faster, better.
• Creating new opportunities for competitive
advantages.
CONFERENCE15
REAL-TIME
Challenges at Stratio2
Real-time fraud monitoring
DATA RECEIVER
REAL-TIME
AGGREGATION
CONSOLIDATION
Dashboardin
g
Reporting
FRAUD
DETECTION
Leveraging the power of Spark Streaming, we have developed some fraud detection
solutions, aggregating data in real-time to work better with machine learning
algorithms.
CONFERENCE17
Extract, Transform and Aggregate
By combining Apache Flume and Spark Streaming we have deployed complex
topologies to deal with data coming from heterogeneous sources.
The full solution allow us to transform and aggregate data on-the-fly
(data cleaning, normalization and enrichment)
REAL-TIME
AGGREGATION
Dashboardin
g
Reporting
CONFERENCE18
Custom data sources and storage
Each project requires
specific inputs and data
storages, dealing with
different kinds of
events.
From click stream
activity to bank
transactions...
DATA STREAM
LOADING
TRANSFORM
CUSTOM LOGS
CONFERENCE19
Towards a generic real-time aggregation platform
At Stratio, we have implemented several real-time analytic projects based
on Apache Spark, Kafka, Flume, Cassandra, or MongoDB.
These technologies were always a perfect fit, but soon we found ourselves
writing the same pieces of integration code over and over again.
This is how SPARKTA was born.
CONFERENCE20
ELSEWHERE3
#1 RainBird from Twitter
Some folks from twitter shared some thoughts
about their real-time needs at Strata (2011).
They worked on a “generic” platform in order to
deal with pre-calculated data from a huge number
of events.
It allows them to deal with:
• Data Structures
• Hierarchical Aggregation
• Temporal Aggregation
• Multiple Formulas
Still not open sourceCURRENT STATE
http://goo.gl/ykvQa
CONFERENCE22
#2 Countandra
Countandra is a hierarchical distributed counting
engine exploiting all the excellent write&read
performance of Cassandra.
It supports:
• Geographically distributed counting.
• Easy Http Based interface to insert counts.
• Hierarchical counting such as
com.mywebsite.music.
• Retrieves counts, sums and square in near real-
time.
• Simple Http queries provides desired output in Json
format
• Queries can be sliced by period such as lasthour
,lastyear and so on for minutely,hourly,daily,monthly
values
https://github.com/milindparikh/Countandra
Rather deprecatedCURRENT STATE
CONFERENCE23
#3 ThunderRain from Intel
ThunderRain is a Real-Time Analytical Processing
(RTAP) example using Spark and Shark, which
can be best characterized by the following four
salient properties:
• Data continuously streamed in & processed
in near real-time
• Real-time data queried and presented
in an online fashion
• Real-time and history data combined
and mined interactively
• Predominant RAM-based processing
https://github.com/thunderain-
project/thunderain
Rather deprecatedCURRENT STATE
CONFERENCE24
#4 TSAR from Twitter
TSAR (the TimeSeries AggregatoR) is a
flexible, reusable, end-to-end service
architecture on top of Summingbird.
Twitter really needs a truly robust real-
time aggregation service considering their
scaling and evolving needs.
They realized that many time-series
applications call for essentially the same
architecture, with only slight variations in
the data model.
https://blog.twitter.com/2014/tsar-a-timeseries-aggregator
Still not open sourceCURRENT STATE
CONFERENCE25
Towards a generic real-time aggregation platform
Some initiatives have tried to solve this problem, but until now most of them
were complex or obsolete while others were not open source.
For this reason, Stratio created SPARKTA: an open source and full-featured
platform for real-time analytics, based on Apache Spark.
This is why SPARKTA was conceived
CONFERENCE26
4
THIS IS
SPARKTA
Distributed, high-volume & pluggable analytics framework
Our goals:
Since Aryabhatta invented zero, Mathematicians such as John von Neuman have
been in pursuit of efficient counting and architects have constantly built systems that
computes counts quicker. In this age of social media, where 100s of 1000s events
take place every second, we designed a aggregation engine to deliver real-time
service
• Pure Spark!
• No need of coding, only declarative aggregation
workflows
• Data continuously streamed in & processed in near real-
time
• Ready to use out of the box
• Plug & play: flexible workflows (inputs, outputs, parsers,
etc…)
• High performance
• Scalable and fault tolerant
CONFERENCE28
Sparkta: A first look
DRIVER - SUPERVISOR
AGGREGATION POLICY
QUERY
SERVICES
Aggregation policy
definition is sent to the
engine
Allows multiple application to be
defined, each of which is bound to
a context, executing the
aggregation workflow
others
AGGREGATION WORKFLOW
CONFERENCE29
Sparkta: Deploy any number of real-time aggregation policies
DRIVER - SUPERVISOR
You can start
several workflows
at any time, and
also stop or
monitor them
CONFERENCE30
Sparkta: Key Technologies
+
Apache Kite SDK
INPUTS PROCESSING
RabbitMQ
ZeroMQ
Twitter
Flume
Kafka
....
OUTPUTS
..
..
CONFERENCE31
Sparkta: Define your real-time needs
AGGREGATION POLICY
Remember: no need to code anything.
Define your workflow in a JSON document, including:
INPUT Where is the data coming from?
OUTPUT(s) Where should aggregate data be stored?
DIMENSION(s) Which fields will you need for your real-time
needs?
ROLLUP(s) How do you want to aggregate the dimensions?
TRANSFORMATION(s) Which functions should be applied before aggregation?
SAVE RAW DATA Do you want to save raw events?
CONFERENCE32
Sparkta: Key Technologies
ROLLUPS
• Pass-through
• Time-based
• Secondly, minutely, hourly, daily,
monthly, yearly...
• Hierarchycal
• GeoRange: Areas with different sizes
(rectangles)
OPERATORS
• Max, min, count, sum
• Average, median
• Stdev, variance, count distinct
• Last value
• Full-text search
KiteSDK
CONFERENCE33
Sparkta SDK
INPUT
OUTPUT(s)
DIMENSION(s)
OPERATORS
TRANSFORMATION(s)
Sparkta has been conceived as an SDK.
You can extend several points of the platform to
fulfill your needs, such as adding new inputs,
outputs, operators, dimension types.
Add new functions to Apache Kite in order to
extend the data cleaning, enrichment and
normalization capabilities.
CONFERENCE34
NEXT STEPS5
Source: mydisguises.com
Next steps in our roadmap (1)
Sparkta is a work in progress, so we still have some nice features to
develop…
QUERY
SERVICES
ALARMS
Creating a REST services layer in order to query the
aggregated data allows us to isolate the final consumer
from the specific data storage
Features
- Time ranges
- Agreggation on time ranges
- Best rollup selection
For example, I want to know if we have earned over $3000 in
London in the last hour...
Remember operational intelligence!
CONFERENCE36
Next steps in our roadmap (II)
WEB
APPLICATION
DEPLOYING &
MONITORING
How about a nice web interface to create and manage policies?
Forget the JSON file and use your mouse to define the workflow :)
We have been working with Spark jobServer & Yarn, but it will be
nice to support Mesos, for example.
Hey, did you miss something? Do you have a great idea?
Let us know!
MORE AWESOMENESS
CONFERENCE37
OPEN SOURCE
& COMMUNITY6
OPEN TO YOUR IDEAS
www.stratio.com
@StratioBD
https://github.com/stratio/sparkta
SPARKTA is fully open source
Apache 2 License.
We are open to contributors & ideas
CONFERENCE39
DEMO TIME7
Do you want to try SPARKTA?
Use a full-featured sandbox to start trying SPARKTA
vagrant init “stratio/sparkta”
vagrant up
Just open a shell and type
CONFERENCE41
Do you want to try SPARKTA?
Getting some real-time stats from
#StrataHadoop
Our real-time policy defines some
rollups in order to know chatty users, hot
hashtags, and heatmaps from
StrataConf tweets.
We are using the standard Twitter input
from Spark Streaming, ElasticSearch
output & Kibana to display results
CONFERENCE42
BIG DATA
CHILD`S PLAY

Mais conteúdo relacionado

Mais procurados

Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark Summit
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoDatabricks
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeDatabricks
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiBrian Olsen
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemDataWorks Summit
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisJen Aman
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets Jowanza Joseph
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Databricks
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaSpark Summit
 

Mais procurados (20)

Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 

Semelhante a [Strata] Sparkta

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
7_considerations_final
7_considerations_final7_considerations_final
7_considerations_finalJane Roberts
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridPaolo Castagna
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
SAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-offSAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-offJan Penninkhof
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?confluent
 
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin PrašovićOracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin PrašovićBosnia Agile
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE
 

Semelhante a [Strata] Sparkta (20)

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
7_considerations_final
7_considerations_final7_considerations_final
7_considerations_final
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
SAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-offSAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-off
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin PrašovićOracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
 

Mais de Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Stratio
 

Mais de Stratio (20)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

[Strata] Sparkta

  • 1. SPARKTA A real-time analytics platform based on Apache Spark London, May 2015
  • 2. FIRST SPARK PLATFORM. APR 2014 20+ INTERNATIONAL PROJECTS WITH SPARK
  • 4. STRATIO INGESTION Customer lake STRATIO STREAMING STRATIO QUANTUM STRATIO DEEP STRATIO CROSSDATA ODBC JBDC API Rest CRM ERP Call Center BI Internal Data External Data BI AD HOC APP Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases STRATIO DATAVIS 4
  • 5. STRATIO INGESTION Customer lake STRATIO STREAMING STRATIO QUANTUM STRATIO DEEP STRATIO CROSSDATA ODBC JBDC API Rest CRM ERP Call Center BI Internal Data External data BI AD HOC APP Ingests, transforms Analyzes and processes real time streaming A unified SQL interface Machine Learning and algorithms Processes & combines with Spark STRATIO DATAVIS Creates and designs dashboards and reports Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases 5
  • 6. STRATIO INGESTION Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes A unified SQL interface Machine Learning and algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLlib Internal Data External Data BI AD HOC APP Combines with Spark data from any source Customer lake STRATIO DEEP Processes & combines with Spark Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases STRATIO DATAVIS Creates and designs dashboards and reports 6
  • 7. STRATIO INGESTION Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes Consult & analyze. SQL interface Machine Learning & algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLib Internal Data External Data BI AD HOC APP Data combination through time Customer lake STRATIO DEEP Processes & combines with Spark Real-time Ephemer al tables Past Stored tables Future Quantum tables STRATIO DATAVIS Creates and designs dashboards and reports 7
  • 8. STRATIO DATAVIS STRATIO INGESTION Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes Consulta y analiza. Interfaz SQL Machine Learning & algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLlib Internal Data External Data Creates and designs dashboards and reports Customer lake STRATIO DEEP Processes & combines with Spark Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases INFORMATIONAL + OPERATIONAL WITHOUT NEED TO REPLICATE DATA Oracle, DB2 Other Databases Mongo DB TeradataOPERATIONAL 8
  • 10. The time is N W We all know this story already Social media and networking sites are a part of the fabric of everyday life, changing the way the world shares and accesses information. The overwhelming amount of information gathered not only from messages, updates and images but also readings from sensors,GPS signals and many other sources was the origin of a (big) technological revolution. Remember? VOLUME, VARIETY & VELOCITY CONFERENCE10
  • 11. Look at these sexy infographics! We all love data visualization Insights from this vast amount of data allows us to learn from the users and explore our own world. We can follow in real-time the evolution of a topic, an event or even an incident just by exploring aggregated data. CONFERENCE11
  • 12. Delivering real-time business in the Internet But beyond cool visualizations, there are some core services delivered in real-time, using aggregated data to answer common questions in the fastest way. These services are the heart of the business behind their nice logos. Site traffic, user engagement monitoring, service health, APIs, internal monitoring platforms, real-time dashboards… Aggregated data feeds directly to end users, publishers, and advertisers, among others. CONFERENCE12
  • 13. Pushing business’ processes to perform faster Digital companies, born to develop their services in real-time have changed the expectations of many others businesses. Real-time information makes it possible for a company to be much more agile than its competitors, improving business answers, gaining insights on their performance… CONFERENCE13
  • 14. Listen to your data… CLIENTTPV Accounts Loans and credits Insurances Broker Mortgages Cards Deposits ATM Online gateway application logs Social networks transactions geolocation CRM Where as business intelligence is data gathered for the purpose of analyzing trends over time, operational intelligence provides a picture of what is currently happening within a process. And we can listen to almost everything! Orders, transactions, clicks, calls, bookings, internal services... CONFERENCE14
  • 15. …and start delivering real-time services Real-time monitoring could be really nice, but your company needs to work in the same way as digital companies: • Rethinking existing processes to deliver them faster, better. • Creating new opportunities for competitive advantages. CONFERENCE15
  • 17. Real-time fraud monitoring DATA RECEIVER REAL-TIME AGGREGATION CONSOLIDATION Dashboardin g Reporting FRAUD DETECTION Leveraging the power of Spark Streaming, we have developed some fraud detection solutions, aggregating data in real-time to work better with machine learning algorithms. CONFERENCE17
  • 18. Extract, Transform and Aggregate By combining Apache Flume and Spark Streaming we have deployed complex topologies to deal with data coming from heterogeneous sources. The full solution allow us to transform and aggregate data on-the-fly (data cleaning, normalization and enrichment) REAL-TIME AGGREGATION Dashboardin g Reporting CONFERENCE18
  • 19. Custom data sources and storage Each project requires specific inputs and data storages, dealing with different kinds of events. From click stream activity to bank transactions... DATA STREAM LOADING TRANSFORM CUSTOM LOGS CONFERENCE19
  • 20. Towards a generic real-time aggregation platform At Stratio, we have implemented several real-time analytic projects based on Apache Spark, Kafka, Flume, Cassandra, or MongoDB. These technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again. This is how SPARKTA was born. CONFERENCE20
  • 22. #1 RainBird from Twitter Some folks from twitter shared some thoughts about their real-time needs at Strata (2011). They worked on a “generic” platform in order to deal with pre-calculated data from a huge number of events. It allows them to deal with: • Data Structures • Hierarchical Aggregation • Temporal Aggregation • Multiple Formulas Still not open sourceCURRENT STATE http://goo.gl/ykvQa CONFERENCE22
  • 23. #2 Countandra Countandra is a hierarchical distributed counting engine exploiting all the excellent write&read performance of Cassandra. It supports: • Geographically distributed counting. • Easy Http Based interface to insert counts. • Hierarchical counting such as com.mywebsite.music. • Retrieves counts, sums and square in near real- time. • Simple Http queries provides desired output in Json format • Queries can be sliced by period such as lasthour ,lastyear and so on for minutely,hourly,daily,monthly values https://github.com/milindparikh/Countandra Rather deprecatedCURRENT STATE CONFERENCE23
  • 24. #3 ThunderRain from Intel ThunderRain is a Real-Time Analytical Processing (RTAP) example using Spark and Shark, which can be best characterized by the following four salient properties: • Data continuously streamed in & processed in near real-time • Real-time data queried and presented in an online fashion • Real-time and history data combined and mined interactively • Predominant RAM-based processing https://github.com/thunderain- project/thunderain Rather deprecatedCURRENT STATE CONFERENCE24
  • 25. #4 TSAR from Twitter TSAR (the TimeSeries AggregatoR) is a flexible, reusable, end-to-end service architecture on top of Summingbird. Twitter really needs a truly robust real- time aggregation service considering their scaling and evolving needs. They realized that many time-series applications call for essentially the same architecture, with only slight variations in the data model. https://blog.twitter.com/2014/tsar-a-timeseries-aggregator Still not open sourceCURRENT STATE CONFERENCE25
  • 26. Towards a generic real-time aggregation platform Some initiatives have tried to solve this problem, but until now most of them were complex or obsolete while others were not open source. For this reason, Stratio created SPARKTA: an open source and full-featured platform for real-time analytics, based on Apache Spark. This is why SPARKTA was conceived CONFERENCE26
  • 28. Distributed, high-volume & pluggable analytics framework Our goals: Since Aryabhatta invented zero, Mathematicians such as John von Neuman have been in pursuit of efficient counting and architects have constantly built systems that computes counts quicker. In this age of social media, where 100s of 1000s events take place every second, we designed a aggregation engine to deliver real-time service • Pure Spark! • No need of coding, only declarative aggregation workflows • Data continuously streamed in & processed in near real- time • Ready to use out of the box • Plug & play: flexible workflows (inputs, outputs, parsers, etc…) • High performance • Scalable and fault tolerant CONFERENCE28
  • 29. Sparkta: A first look DRIVER - SUPERVISOR AGGREGATION POLICY QUERY SERVICES Aggregation policy definition is sent to the engine Allows multiple application to be defined, each of which is bound to a context, executing the aggregation workflow others AGGREGATION WORKFLOW CONFERENCE29
  • 30. Sparkta: Deploy any number of real-time aggregation policies DRIVER - SUPERVISOR You can start several workflows at any time, and also stop or monitor them CONFERENCE30
  • 31. Sparkta: Key Technologies + Apache Kite SDK INPUTS PROCESSING RabbitMQ ZeroMQ Twitter Flume Kafka .... OUTPUTS .. .. CONFERENCE31
  • 32. Sparkta: Define your real-time needs AGGREGATION POLICY Remember: no need to code anything. Define your workflow in a JSON document, including: INPUT Where is the data coming from? OUTPUT(s) Where should aggregate data be stored? DIMENSION(s) Which fields will you need for your real-time needs? ROLLUP(s) How do you want to aggregate the dimensions? TRANSFORMATION(s) Which functions should be applied before aggregation? SAVE RAW DATA Do you want to save raw events? CONFERENCE32
  • 33. Sparkta: Key Technologies ROLLUPS • Pass-through • Time-based • Secondly, minutely, hourly, daily, monthly, yearly... • Hierarchycal • GeoRange: Areas with different sizes (rectangles) OPERATORS • Max, min, count, sum • Average, median • Stdev, variance, count distinct • Last value • Full-text search KiteSDK CONFERENCE33
  • 34. Sparkta SDK INPUT OUTPUT(s) DIMENSION(s) OPERATORS TRANSFORMATION(s) Sparkta has been conceived as an SDK. You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, dimension types. Add new functions to Apache Kite in order to extend the data cleaning, enrichment and normalization capabilities. CONFERENCE34
  • 36. Next steps in our roadmap (1) Sparkta is a work in progress, so we still have some nice features to develop… QUERY SERVICES ALARMS Creating a REST services layer in order to query the aggregated data allows us to isolate the final consumer from the specific data storage Features - Time ranges - Agreggation on time ranges - Best rollup selection For example, I want to know if we have earned over $3000 in London in the last hour... Remember operational intelligence! CONFERENCE36
  • 37. Next steps in our roadmap (II) WEB APPLICATION DEPLOYING & MONITORING How about a nice web interface to create and manage policies? Forget the JSON file and use your mouse to define the workflow :) We have been working with Spark jobServer & Yarn, but it will be nice to support Mesos, for example. Hey, did you miss something? Do you have a great idea? Let us know! MORE AWESOMENESS CONFERENCE37
  • 39. OPEN TO YOUR IDEAS www.stratio.com @StratioBD https://github.com/stratio/sparkta SPARKTA is fully open source Apache 2 License. We are open to contributors & ideas CONFERENCE39
  • 41. Do you want to try SPARKTA? Use a full-featured sandbox to start trying SPARKTA vagrant init “stratio/sparkta” vagrant up Just open a shell and type CONFERENCE41
  • 42. Do you want to try SPARKTA? Getting some real-time stats from #StrataHadoop Our real-time policy defines some rollups in order to know chatty users, hot hashtags, and heatmaps from StrataConf tweets. We are using the standard Twitter input from Spark Streaming, ElasticSearch output & Kibana to display results CONFERENCE42

Notas do Editor

  1. Buscar reloj para reemplazar la O.
  2. Buscar reloj para reemplazar la O.
  3. Buscar reloj para reemplazar la O.
  4. Buscar reloj para reemplazar la O.
  5. Buscar reloj para reemplazar la O.
  6. Buscar reloj para reemplazar la O.
  7. Buscar reloj para reemplazar la O.
  8. Buscar reloj para reemplazar la O.
  9. Buscar reloj para reemplazar la O.
  10. Buscar reloj para reemplazar la O.
  11. Buscar reloj para reemplazar la O.
  12. Buscar reloj para reemplazar la O.
  13. Buscar reloj para reemplazar la O.
  14. Buscar reloj para reemplazar la O.
  15. Buscar reloj para reemplazar la O.
  16. Buscar reloj para reemplazar la O.
  17. Buscar reloj para reemplazar la O.
  18. Buscar reloj para reemplazar la O.
  19. Buscar reloj para reemplazar la O.
  20. Buscar reloj para reemplazar la O.
  21. Buscar reloj para reemplazar la O.
  22. Buscar reloj para reemplazar la O.
  23. Buscar reloj para reemplazar la O.
  24. Buscar reloj para reemplazar la O.
  25. Buscar reloj para reemplazar la O.
  26. Buscar reloj para reemplazar la O.
  27. Buscar reloj para reemplazar la O.