SlideShare uma empresa Scribd logo
1 de 47
Delivering Operational Analytics Using
Spark and NoSQL Data Stores
Mike Ferguson
Managing Director
Intelligent Business Strategies
Basho Webinar
January, 2016
2
Copyright © Intelligent Business Strategies 1992-2016!
About Mike Ferguson
Mike Ferguson is Managing Director of Intelligent
Business Strategies Limited. As an analyst and
consultant he specializes in business
intelligence, data management and enterprise
business integration. With over 34 years of IT
experience, Mike has consulted for dozens of
companies, spoken at events all over the world
and written numerous articles. Formerly he was
a principal and co-founder of Codd and Date
Europe Limited – the inventors of the Relational
Model, a Chief Architect at Teradata on the
Teradata DBMS and European Managing
Director of DataBase Associates.
www.intelligentbusiness.biz
mferguson@intelligentbusiness.biz
Twitter: @mikeferguson1
Tel/Fax (+44)1625 520700
3
Copyright © Intelligent Business Strategies 1992-2016!
Topics
 The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
4
Copyright © Intelligent Business Strategies 1992-2016!
Topics
The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
5
Copyright © Intelligent Business Strategies 1992-2016!
The Application Processing Spectrum
Source: BI-Research Copyright © BI-Research, 2013-Present
6
Copyright © Intelligent Business Strategies 1992-2016!
Big Data Processing – There Is A Growing Number of Data
Stores Optimized for Operational or Analytical Workloads
OLTP RDBMS
NoSQL DBMS NoSQL
• ACID support missing in many NoSQL DBMSs
• Can you live with losing a transaction?
• OK for sensor data for example
Analytical RDBMS
7
Copyright © Intelligent Business Strategies 1992-2016!
Analytical
Systems
A Closed Loop Is Still Needed – It Just Now Also
Includes NoSQL Technologies
Operational
applications
Scalable
Analytical
Systems
data data
new data
new insights
Scalable
Operational
applications
Relational &
NoSQL systems
Relational &
NoSQL systems
8
Copyright © Intelligent Business Strategies 1992-2016!
Topics - – Where Are We?
 The changing landscape of operational and analytical systems
Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
9
Copyright © Intelligent Business Strategies 1992-2016!
Analytical
Systems
Demand For Scalable Operational Systems With High
Write Processing Is Driving Demand for NoSQL DBMS
Operational
applications
Scalable
Analytical
Systems
data data
new data
new insights
Scalable
operational
applications
10
Copyright © Intelligent Business Strategies 1992-2016!
Success of Big Data Analytics Depends On Being Able
To Scale To Capture High Velocity, High Volume Data
 Successful big data analytics requires
1. Ability to scale operational systems to capture, stream and store
the required transactional and non-transactional data
– Support peak transaction rates
– Support peak capture of non-transactional data e.g. shopping
cart data
– Support peak data arrival rates e.g. sensor data
– Support peak ingestion rates
2. Scalable Big Data analytics
3. Closed loop integration of analytical systems back into core
operational transaction processing systems
– Make prescriptive insights available to all that need them to
continuously optimise operations and maximise effectiveness
11
Copyright © Intelligent Business Strategies 1992-2016!
E-Business And Mobile Means Operational Systems Are
Having To Scale To Support Masses Of Concurrent Users
Many more users
Operational
applications
Transactional
applications
dataWeb
logs
Cluster
Mobile devices
WWW
data data data
partitioned data
12
Copyright © Intelligent Business Strategies 1992-2016!
Example Operational Applications Requiring Scalability
That Are Fuelling Demand For NoSQL DBMSs
 Web and mobile commerce
• Shopping cart data, session storage
 Internet of Things (IoT) and other time series applications
• Need to scale as the number of devices / things increase
 Mobile gaming
• Player profile data, session storage, game performance stats
 Healthcare
• Store unstructured healthcare digital imaging and video data
 Social network applications
13
Copyright © Intelligent Business Strategies 1992-2016!
Types Of NoSQL Database And Product Examples
NoSQL Database Type NoSQL Product Examples
Key Value store Aerospike, Amazon DynamoDB, Basho Riak KV,
Redis, MemcacheDB, Voldemort
Document database CouchDB, IBM DB2 (XML & JSON), MongoDB, IBM
Cloudant, Marklogic, Terrastore, JackRabbit, RaptorDB
Column Family
database
Casandra, DataStax, Google BigTable, Hadoop HBase,
Hypertable, HPCC, Amazon SimpleDB
Graph database AllegroGraph, GraphBase, Horton, InfiniteGraph, IBM
DB2, Neo4j, Oracle Spatial and Graph, Titan, Cray
Research, Teradata Aster
Multi-modal database ArangoDB, CortexDB, MarkLogic , MongoDB
FoundationDB,
 Some NoSQL databases are aimed at write processing (data collection)
 Others are aimed at specific big data analytical workloads
 Issues include lack of standard APIs, weak or no optimizer and non-
immediate consistency
14
Copyright © Intelligent Business Strategies 1992-2016!
Global NoSQL Market Size And Forecast 2013 - 2020
Source: https://www.alliedmarketresearch.com/NoSQL-market
15
Copyright © Intelligent Business Strategies 1992-2016!
Key Value Stores Can Store Any Data - Examples
Key Value
10034 John Smith
82771
93441
{ "firstName": ”Wayne",
"lastName": ”Rooney",
"age": 25,
"address": {
"streetAddress": "21 Sir Matt Busby Way",
"city": ”Manchester”,
“country”: “England”,
"postalCode": “M1 6DY”
},
"phoneNumbers": [
{ "type": "home”,
"number": ”0161-123-1234”
},
{
"type": ”mobile",
"number": ”07779-123234”
}
]
}
Key value store features:
• Very simple to understand
• Very scalable - hash partitioning
• Data access is via the key
• The application controls what’s stored in
the value
• Very fast performance
• Acceleration via in-memory processing
• Eventual consistency
• Often no support for data types
• No built-in referential integrity
• No understanding of data relationships
• The application must understand any
relationships in data
• Programmer is in complete control
• Application must navigate complex data
Use for specific operational applications
16
Copyright © Intelligent Business Strategies 1992-2016!
Key Value Stores
– The Key Is Hashed To Partition The Data
Source: Microsoft
The value can be anything
• A single data field
• A JSON document
• An XML document
• Text
• Image……
Key Value
Easy to partition (hash the key)
Very fast to retrieve and store data
The application needs to know
• What is stored in the VALUE
• How the value is structured
• How to process the value
Key needs to be unique
Can use HTTP to read and write data
e.g. CURL –XPUT, CURL -XGET
17
Copyright © Intelligent Business Strategies 1992-2016!
Key Value Stores – A Basho Riak KV Cluster Has
Virtual Nodes Running on Physical Nodes
Source: Basho
SHA1 is a hashing function that hashes a key to determine the node
Riak hash partitions and replicates data (3 copies of the data is the default)
e.g. PUT,
POST, GET….
the valuethe key
hash the key
Nodes can be
added and removed
to a Riak cluster
while it is running
18
Copyright © Intelligent Business Strategies 1992-2016!
Key Value Stores - A Basho Riak KV Ring
Riak uses partitions (64 partitions
are the default) and also replicates
the partitions for high availability
Source: Basho
Writing replicas
19
Copyright © Intelligent Business Strategies 1992-2016!
Topics – Where Are We?
 The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
20
Copyright © Intelligent Business Strategies 1992-2016!
Analytical
Systems
Demand For Scalable Analytical Systems Is Also
Exploding
Operational
applications
Scalable
Analytical
Systems
data data
new data
new insights
Scalable
operational
applications
21
Copyright © Intelligent Business Strategies 1992-2016!
A Hadoop System
Java, Python,
Scala
file file file file file
file file file file file
file file
file file
webHDFS
(An HTTP
interface to
HDFS has
REST APIs)
HDFS
file
file
file
file
PIG latin
scripts
3rd Party SQL
on Hadoop
Analytic
Application
index
indexIndex
partition
SQL
BI Tools
Storm
YARN
MapReduce Tez Spark
SQL
HBase
w
e
b
H
D
F
S
APIs to HBase, APIs to
HDFS
executes on
MR, Tez &
Spark
22
Copyright © Intelligent Business Strategies 1992-2016!
Faster Execution Engines For Analytic Applications
– Apache Spark
Java, Python,
Scala
file file file file file
file file file file file
file file
file file
webHDFS
(An HTTP
interface to
HDFS has
REST APIs)
HDFS
file
file
file
file
PIG latin
scripts
3rd Party SQL
on Hadoop
Analytic
Application
index
indexIndex
partition
SQL
BI Tools
Storm
YARN
MapReduce Tez Spark
SQL
HBase
w
e
b
H
D
F
S
APIs to HBase, APIs to
HDFS
23
Copyright © Intelligent Business Strategies 1992-2016!
Spark Is A General Purpose In-Memory Execution
Framework That Can Run With Or Without Hadoop
file file file file file
file file file file file
file file
file file
HDFS
file
file
file
file
Storm
YARN
MapReduce Tez Spark
HBase
w
e
b
H
D
F
S
HDFS, S3…..
Tachyon
Spark also
includes an
HDFS
compatible
in-memory
file system
You can use
Spark with
or without
Tachyon
The Spark stack is integrated – E.g. You can use Spark Streaming,
SparkSQL and MLBase together in the same application
Applications / BI Tools
Spark Core
Spark
Streaming
R
Spark SQL
+
DataFrames
GraphX
(Graph
Computation)
MLlib
(Machine
Learning)
SQL Python Scala Java
24
Copyright © Intelligent Business Strategies 1992-2016!
Applications / BI Tools
Spark Core
Spark
Streaming
R
Spark SQL +
DataFrames
GraphX
(Graph Computation)
MLlib
(Machine
Learning)
SQL Python Scala Java
Apache Spark
Provides distributed task
dispatching, scheduling,
and basic I/O.
For analysis of real-
time streaming data
A library of pre-built analytic
algorithms that can run in
parallel across a Spark cluster
A graph analysis engine
running on Spark
Query structured data in
Spark apps using SQL
or a DataFrames API
25
Copyright © Intelligent Business Strategies 1992-2016!
Spark In-Memory Analytic Applications Can Do A
Lot More Than Map Reduce Processing
 Keep only one copy
in memory in a JVM
 Track lineage of job
operators used to
derive the data
 Use the lineage to
re-compute the
data if there is a
failure
 No MapReduce
execution needed
• Just Spark APIs
map
map
join
filter
reduce
Source: Amplab
Spark application
HDFSfile file file file file file
Spark Applications / BI Tools
Spark Core
Spark
Streaming
R
Spark SQL
+
DataFrames
GraphX
(Graph
Computation)
MLlib
(Machine
Learning)
SQL Python Scala Java
26
Copyright © Intelligent Business Strategies 1992-2016!
Spark Applications Operate On RDDs (Data)
– You Can Do A Lot More Than Map and Reduce
 RDD = Resilient Distributed Datasets
 An RDD is a read-only, partitioned collection of records
 RDDs can be only created through operators on either
1. A dataset in stable storage or
2. Other existing RDDs.
Map Reduce Sample
Filter Count Take
Groupby Fold First
Sort Reducebykey Partitionby
Union groupByKey Mapwith
Join Cogroup Mapwith
Leftouterjoin Cross Pipe
Rightouterjoin Zip Save
Spark
Operators
Spark Applications
27
Copyright © Intelligent Business Strategies 1992-2016!
Simplifying Access To Data Using Via SparkSQL
and Spark DataFrames
 A DataFrame is a distributed
collection of data organized into
named columns
 Conceptually equivalent to a
relational DBMS table or a data
frame in R/Python
 DataFrames can be constructed
from a wide array of sources:
• Structured data files
• Hive tables
• External databases
• Existing RDDs
 Uses schema on read
Image source: Databricks.com
Note: that Spark data
sources can be
relational & NoSQL
DBMSs
28
Copyright © Intelligent Business Strategies 1992-2016!
Spark Is Going Over The Top of Multiple Data Stores For
Scalable In-Memory Analytics Across The Entire Ecosystem
Streaming
data
Hadoop
data store
Data Warehouse
RDBMS
NoSQL
DBMS
EDW
DW & martsAdvanced Analytic
(multi-structured
data)
mart
Operational NoSQL
Data Stores
Streaming
analytics
e.g. Casandra,
Basho Riak
Applications / BI Tools
Spark Core
Spark
Streaming
R
Spark SQL +
DataFrames
GraphX
(Graph
Computation)
MLlib
(Machine
Learning)
SQL Python Scala Java
29
Copyright © Intelligent Business Strategies 1992-2016!
Topics – Where Are We?
 The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
30
Copyright © Intelligent Business Strategies 1992-2016!
Key Business Drivers And Objectives For
Operational Analytics
 Combine operational and analytical processing at scale to:
• Improve customer engagement
• Reduce risk
• Avoid unplanned operational cost
• Optimise operational effectiveness
 Use BI/Analytics to drive and guide business operations to help
achieve specific target business goals and KPI targets
Automated analysis of operational events as they happen
Automated alerts
On-demand recommendations
 Integrate BI/Analytics into every business process to:
• Create a ‘insight driven’ employee base
• Enable mass execution of business strategy via facilitating
mass contribution towards achieve specific business goals
31
Copyright © Intelligent Business Strategies 1992-2016!
Five Types Of Operational BI/Analytics
1. Simple operational reporting of current position/state e.g.
session state
2. Situational awareness via visualisation of live operational data
typically on dashboards
3. On-demand analytics of live operational and/or historical data
to improve operational decisions and effectiveness
4. On-demand recommendations for guidance
5. Event stream processing to monitor, automatically analyse and
act on events in real-time to prevent problems arising and to
optimise business operations
32
Copyright © Intelligent Business Strategies 1992-2016!
BI/ Analytics Apps /
Services
Operational Analytics – What’s The Difference
Between On-Demand Vs Event-Driven Analysis?
BI/ Analytics Services
Application
On-Demand
Analytical service
(query, report, model,
recommendation)
Message, file arrival, pattern, trigger
Event-Driven
Analytical service
(query, report, model,
recommendation)
streaming
data
33
Copyright © Intelligent Business Strategies 1992-2016!
Analytics Need To Be Integrated Into Business
Processes To Optimize Business Operations
Customers Partners &
suppliers
Customer
relationship
management
Operations
management
Supply
chain
management
Marketing
Sales
Service/support
Operations
Finance/accounting
Procurement
Inventorycontrol
Shipping/distribution
Humanresources
Employees
Integrated Intelligent Business Operations
Integrated On-Demand Business Intelligence
34
Copyright © Intelligent Business Strategies 1992-2016!
High Value Application Use Cases for Streaming
Analytics
Streaming
Analytics
Source: Adapted from a slide by IBM
35
Copyright © Intelligent Business Strategies 1992-2016!
Responding To Events And Event Patterns Means
Reducing Action Time
The time between an event
occurring and action being
taken being as close to zero
as possible
Action distance or action time
Event-
driven data
integration
Automated
analysis
Automated
decision and
action taking
Source: Dr Richard Hackathorne
36
Copyright © Intelligent Business Strategies 1992-2016!
With Event Stream Processing The Architecture
Has To Change
Data
cleansing &
integration
Store data
Query/Analyze
(human)
Store
data
Query/Analyze
(automated)
Classic Use
of Analytics
Event / Stream
processing
Act
(automated
or human)
Data
cleansing &
integration
37
Copyright © Intelligent Business Strategies 1992-2016!
Time Series Analysis – Query Processing Uses a Time
Window to Look at Continuously Streaming Data
Time Window
T1 T2
E.g. 5 seconds
or 30 seconds
or 5 minutes
Pattern/correlation
Continuous time series
queries (CQs) operate on
the data as it flows by
Stream
processin
g server
CQs
A set of queries (continuous
queries) reside in the data stream
server to process incoming data
Data is pushed into the queries
High frequency data
38
Copyright © Intelligent Business Strategies 1992-2016!
Key Requirements For Operational Analytics
 On-demand, event-driven and scheduled invocation of analytics
 Monitor streaming events as they happen via automatic analysis
 Automatic analysis via predictive and statistical models
 Automatic interpretation of predictive/statistical model outcomes
 Rule-driven automatic actions to automate decision making
• E.g. Alerts, recommendations, transaction and process invocation
 Integrate operational analytics into operational applications
 Operational reporting
 Scale to support large numbers of events and concurrent users
 Store relevant data together to speed up analytics execution
 Run predictive and statistical models close to the data
 Run analytics on a 24x365 basis
39
Copyright © Intelligent Business Strategies 1992-2016!
The Importance of In Memory Processing
 Massively parallel in-memory processing
is mission critical for scalable operational
systems and operational analytics
 Why?
• Performance is a critical
• Large number of concurrent user requests
for on-demand analytics
• Large number of concurrent application
requests for on-demand analytics
• Event driven operational analytics on very
high velocity data needs memory
40
Copyright © Intelligent Business Strategies 1992-2016!
Topics – Where Are We?
 The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
Operational analytics using The Basho Data Platform and
Apache Spark
 Conclusions
41
Copyright © Intelligent Business Strategies 1992-2016!
The Basho Data Platform
SERVICE
INSTANCES
STORAGE
INSTANCES
Solr
Spark
Redis
(Caching)
Solr
Elastic
Search
Web
Services
3rd Party
Web
Services &
Integrations
Riak!KV!
!Key/Value
Riak S2 !
Object
Storage
Riak TS !!
Time!Series!
Document
Store
Columnar Graph
Replicate &
Synchronize
Message
Routing
Cluster
Management
& Monitoring
Logging &
Analytics
Internal Data
Store
CORE SERVICES
BASHO DEVELOPED BASHO INTEGRATED
THE!BASHO!DATA!PLATFORM!
Source:(Basho( hash partitioning,
cluster scalability,
triple replication,
multi-datacentre
replication
co-locates time-series data,
high availability, scalability
replicates and
synchronises data
within and across
Riak KV, Redis and
Spark Clusters Automated cluster
management simplifies
administration
Integrated in-memory
caching for faster
application performance
Search based query
processing on Riak data
using Solr indexes
Integrated in-memory
analytics for Riak KV
and Riak TS data
42
Copyright © Intelligent Business Strategies 1992-2016!
Riak TS Is A New Basho Storage Instance
Optimised for Time Series Data And Analytics
 A distributed NoSQL database optimised for time series
sequenced, unstructured data capture, aggregation and
analysis from the Internet of Things (IoT)
 Highly availability
 Scalability - add nodes to a cluster without sharding
 Automated and uniform data distributed across the cluster
• Time of geohash based data co-location to ensure time series data
is located on the same node
 Data validation on input
 APIs and client libraries for Java, Ruby, Python, Go, Erlang,
Node.js or .NET.
 Spark integration for operational analysis of time series data.
43
Copyright © Intelligent Business Strategies 1992-2016!
Operational Analytics Using The Basho Data
Platform And Apache Spark
Opera&onal*
analy&cs**
web*service*
Opera&onal*
analy&c**
applica&on*
BI*Tool*
data data data
hash*par&&oned*data*
Scalable*
opera&onal
applica&on*
Spark**Core*
Spark*
Stream
<ing*
BlinkDB*
Spark*
SQL*
GraphX* SparkR*MLlib*
write*back*
Opera&onal*Analy&cs*Using*The*Basho*Data*PlaHorm*
recent data
44
Copyright © Intelligent Business Strategies 1992-2016!
Operational Analytics Using The Basho Data
Platform And Apache Spark - 2
• Can develop Spark operational analytic applications on
low latency data stored in Basho Riak KV
• Spark-based analytical web services can be invoked on-
demand to analyse data in Riak KV
• Use on-demand Spark jobs for historical analysis and predictions
• Insights produced from analysing Riak KV data in can be
written back to Riak KV for use by other applications
• A form of closed-loop processing
• Spark Streaming can be used to calculate rollups and
detect abnormalities on streaming sensor data
• Recent data can be kept in Redis for dashboard
visualization
46
Copyright © Intelligent Business Strategies 1992-2016!
Topics – Where Are We?
 The changing landscape of operational and analytical systems
 Scalable operational applications and NoSQL data stores
 Big data analytics – The era of Hadoop and Spark
 The value of operational analytics
 Operational analytics using The Basho Data Platform and
Apache Spark
Conclusions
47
Copyright © Intelligent Business Strategies 1992-2016!
Conclusions
 As operational application processing scales, so too does
the need to scale operational analytics
 Basho is using in-memory processing to accelerate
operational applications (via Redis) and to introduce
scalable operational analytics (via Spark) into these
applications
 New scalable ‘smart’ operational applications are therefore
becoming possible with careful design in a NoSQL
environment
48
Copyright © Intelligent Business Strategies 1992-2016!
www.intelligentbusiness.biz
mferguson@intelligentbusiness.biz
Twitter: @mikeferguson1
Tel/Fax (+44)1625 520700
Thank You!

Mais conteúdo relacionado

Mais procurados

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture Mark Hewitt
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureAgilisium Consulting
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 

Mais procurados (20)

Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 

Destaque

Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Kai Wähner
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series dataAnuj Sahni
 
Operational Process Analytics - Why traditional analytics and monitoring are ...
Operational Process Analytics - Why traditional analytics and monitoring are ...Operational Process Analytics - Why traditional analytics and monitoring are ...
Operational Process Analytics - Why traditional analytics and monitoring are ...Elmar Weber
 
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewAlex Esterkin
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at OracleDATAVERSITY
 
Automated Schema Design for NoSQL Databases
Automated Schema Design for NoSQL DatabasesAutomated Schema Design for NoSQL Databases
Automated Schema Design for NoSQL DatabasesMichael Mior
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015NoSQLmatters
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsMichael Mior
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersKaren Lopez
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overviewpallavi pentapati
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 

Destaque (20)

Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
Impact of Analytics - Presentation
Impact of Analytics - PresentationImpact of Analytics - Presentation
Impact of Analytics - Presentation
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series data
 
Operational Process Analytics - Why traditional analytics and monitoring are ...
Operational Process Analytics - Why traditional analytics and monitoring are ...Operational Process Analytics - Why traditional analytics and monitoring are ...
Operational Process Analytics - Why traditional analytics and monitoring are ...
 
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at Oracle
 
Automated Schema Design for NoSQL Databases
Automated Schema Design for NoSQL DatabasesAutomated Schema Design for NoSQL Databases
Automated Schema Design for NoSQL Databases
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL Applications
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data Modelers
 
Operational Analytics
Operational AnalyticsOperational Analytics
Operational Analytics
 
Operational analytics overview
Operational analytics overviewOperational analytics overview
Operational analytics overview
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 

Semelhante a Operational Analytics Using Spark and NoSQL Data Stores

Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDBNorberto Leite
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
SAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - OverviewSAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - OverviewSAP Cloud Platform
 
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
Integrating Semantic Web with the Real World  - A Journey between Two Cities ...Integrating Semantic Web with the Real World  - A Journey between Two Cities ...
Integrating Semantic Web with the Real World - A Journey between Two Cities ...Juan Sequeda
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with HadoopPrecisely
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...SoftServe
 

Semelhante a Operational Analytics Using Spark and NoSQL Data Stores (20)

Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDB
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
SAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - OverviewSAP Cloud Platform – Data & Storage - Overview
SAP Cloud Platform – Data & Storage - Overview
 
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
Integrating Semantic Web with the Real World  - A Journey between Two Cities ...Integrating Semantic Web with the Real World  - A Journey between Two Cities ...
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
 

Mais de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Último (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Operational Analytics Using Spark and NoSQL Data Stores

  • 1. Delivering Operational Analytics Using Spark and NoSQL Data Stores Mike Ferguson Managing Director Intelligent Business Strategies Basho Webinar January, 2016
  • 2. 2 Copyright © Intelligent Business Strategies 1992-2016! About Mike Ferguson Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an analyst and consultant he specializes in business intelligence, data management and enterprise business integration. With over 34 years of IT experience, Mike has consulted for dozens of companies, spoken at events all over the world and written numerous articles. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of DataBase Associates. www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700
  • 3. 3 Copyright © Intelligent Business Strategies 1992-2016! Topics  The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark  The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 4. 4 Copyright © Intelligent Business Strategies 1992-2016! Topics The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark  The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 5. 5 Copyright © Intelligent Business Strategies 1992-2016! The Application Processing Spectrum Source: BI-Research Copyright © BI-Research, 2013-Present
  • 6. 6 Copyright © Intelligent Business Strategies 1992-2016! Big Data Processing – There Is A Growing Number of Data Stores Optimized for Operational or Analytical Workloads OLTP RDBMS NoSQL DBMS NoSQL • ACID support missing in many NoSQL DBMSs • Can you live with losing a transaction? • OK for sensor data for example Analytical RDBMS
  • 7. 7 Copyright © Intelligent Business Strategies 1992-2016! Analytical Systems A Closed Loop Is Still Needed – It Just Now Also Includes NoSQL Technologies Operational applications Scalable Analytical Systems data data new data new insights Scalable Operational applications Relational & NoSQL systems Relational & NoSQL systems
  • 8. 8 Copyright © Intelligent Business Strategies 1992-2016! Topics - – Where Are We?  The changing landscape of operational and analytical systems Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark  The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 9. 9 Copyright © Intelligent Business Strategies 1992-2016! Analytical Systems Demand For Scalable Operational Systems With High Write Processing Is Driving Demand for NoSQL DBMS Operational applications Scalable Analytical Systems data data new data new insights Scalable operational applications
  • 10. 10 Copyright © Intelligent Business Strategies 1992-2016! Success of Big Data Analytics Depends On Being Able To Scale To Capture High Velocity, High Volume Data  Successful big data analytics requires 1. Ability to scale operational systems to capture, stream and store the required transactional and non-transactional data – Support peak transaction rates – Support peak capture of non-transactional data e.g. shopping cart data – Support peak data arrival rates e.g. sensor data – Support peak ingestion rates 2. Scalable Big Data analytics 3. Closed loop integration of analytical systems back into core operational transaction processing systems – Make prescriptive insights available to all that need them to continuously optimise operations and maximise effectiveness
  • 11. 11 Copyright © Intelligent Business Strategies 1992-2016! E-Business And Mobile Means Operational Systems Are Having To Scale To Support Masses Of Concurrent Users Many more users Operational applications Transactional applications dataWeb logs Cluster Mobile devices WWW data data data partitioned data
  • 12. 12 Copyright © Intelligent Business Strategies 1992-2016! Example Operational Applications Requiring Scalability That Are Fuelling Demand For NoSQL DBMSs  Web and mobile commerce • Shopping cart data, session storage  Internet of Things (IoT) and other time series applications • Need to scale as the number of devices / things increase  Mobile gaming • Player profile data, session storage, game performance stats  Healthcare • Store unstructured healthcare digital imaging and video data  Social network applications
  • 13. 13 Copyright © Intelligent Business Strategies 1992-2016! Types Of NoSQL Database And Product Examples NoSQL Database Type NoSQL Product Examples Key Value store Aerospike, Amazon DynamoDB, Basho Riak KV, Redis, MemcacheDB, Voldemort Document database CouchDB, IBM DB2 (XML & JSON), MongoDB, IBM Cloudant, Marklogic, Terrastore, JackRabbit, RaptorDB Column Family database Casandra, DataStax, Google BigTable, Hadoop HBase, Hypertable, HPCC, Amazon SimpleDB Graph database AllegroGraph, GraphBase, Horton, InfiniteGraph, IBM DB2, Neo4j, Oracle Spatial and Graph, Titan, Cray Research, Teradata Aster Multi-modal database ArangoDB, CortexDB, MarkLogic , MongoDB FoundationDB,  Some NoSQL databases are aimed at write processing (data collection)  Others are aimed at specific big data analytical workloads  Issues include lack of standard APIs, weak or no optimizer and non- immediate consistency
  • 14. 14 Copyright © Intelligent Business Strategies 1992-2016! Global NoSQL Market Size And Forecast 2013 - 2020 Source: https://www.alliedmarketresearch.com/NoSQL-market
  • 15. 15 Copyright © Intelligent Business Strategies 1992-2016! Key Value Stores Can Store Any Data - Examples Key Value 10034 John Smith 82771 93441 { "firstName": ”Wayne", "lastName": ”Rooney", "age": 25, "address": { "streetAddress": "21 Sir Matt Busby Way", "city": ”Manchester”, “country”: “England”, "postalCode": “M1 6DY” }, "phoneNumbers": [ { "type": "home”, "number": ”0161-123-1234” }, { "type": ”mobile", "number": ”07779-123234” } ] } Key value store features: • Very simple to understand • Very scalable - hash partitioning • Data access is via the key • The application controls what’s stored in the value • Very fast performance • Acceleration via in-memory processing • Eventual consistency • Often no support for data types • No built-in referential integrity • No understanding of data relationships • The application must understand any relationships in data • Programmer is in complete control • Application must navigate complex data Use for specific operational applications
  • 16. 16 Copyright © Intelligent Business Strategies 1992-2016! Key Value Stores – The Key Is Hashed To Partition The Data Source: Microsoft The value can be anything • A single data field • A JSON document • An XML document • Text • Image…… Key Value Easy to partition (hash the key) Very fast to retrieve and store data The application needs to know • What is stored in the VALUE • How the value is structured • How to process the value Key needs to be unique Can use HTTP to read and write data e.g. CURL –XPUT, CURL -XGET
  • 17. 17 Copyright © Intelligent Business Strategies 1992-2016! Key Value Stores – A Basho Riak KV Cluster Has Virtual Nodes Running on Physical Nodes Source: Basho SHA1 is a hashing function that hashes a key to determine the node Riak hash partitions and replicates data (3 copies of the data is the default) e.g. PUT, POST, GET…. the valuethe key hash the key Nodes can be added and removed to a Riak cluster while it is running
  • 18. 18 Copyright © Intelligent Business Strategies 1992-2016! Key Value Stores - A Basho Riak KV Ring Riak uses partitions (64 partitions are the default) and also replicates the partitions for high availability Source: Basho Writing replicas
  • 19. 19 Copyright © Intelligent Business Strategies 1992-2016! Topics – Where Are We?  The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores Big data analytics – The era of Hadoop and Spark  The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 20. 20 Copyright © Intelligent Business Strategies 1992-2016! Analytical Systems Demand For Scalable Analytical Systems Is Also Exploding Operational applications Scalable Analytical Systems data data new data new insights Scalable operational applications
  • 21. 21 Copyright © Intelligent Business Strategies 1992-2016! A Hadoop System Java, Python, Scala file file file file file file file file file file file file file file webHDFS (An HTTP interface to HDFS has REST APIs) HDFS file file file file PIG latin scripts 3rd Party SQL on Hadoop Analytic Application index indexIndex partition SQL BI Tools Storm YARN MapReduce Tez Spark SQL HBase w e b H D F S APIs to HBase, APIs to HDFS executes on MR, Tez & Spark
  • 22. 22 Copyright © Intelligent Business Strategies 1992-2016! Faster Execution Engines For Analytic Applications – Apache Spark Java, Python, Scala file file file file file file file file file file file file file file webHDFS (An HTTP interface to HDFS has REST APIs) HDFS file file file file PIG latin scripts 3rd Party SQL on Hadoop Analytic Application index indexIndex partition SQL BI Tools Storm YARN MapReduce Tez Spark SQL HBase w e b H D F S APIs to HBase, APIs to HDFS
  • 23. 23 Copyright © Intelligent Business Strategies 1992-2016! Spark Is A General Purpose In-Memory Execution Framework That Can Run With Or Without Hadoop file file file file file file file file file file file file file file HDFS file file file file Storm YARN MapReduce Tez Spark HBase w e b H D F S HDFS, S3….. Tachyon Spark also includes an HDFS compatible in-memory file system You can use Spark with or without Tachyon The Spark stack is integrated – E.g. You can use Spark Streaming, SparkSQL and MLBase together in the same application Applications / BI Tools Spark Core Spark Streaming R Spark SQL + DataFrames GraphX (Graph Computation) MLlib (Machine Learning) SQL Python Scala Java
  • 24. 24 Copyright © Intelligent Business Strategies 1992-2016! Applications / BI Tools Spark Core Spark Streaming R Spark SQL + DataFrames GraphX (Graph Computation) MLlib (Machine Learning) SQL Python Scala Java Apache Spark Provides distributed task dispatching, scheduling, and basic I/O. For analysis of real- time streaming data A library of pre-built analytic algorithms that can run in parallel across a Spark cluster A graph analysis engine running on Spark Query structured data in Spark apps using SQL or a DataFrames API
  • 25. 25 Copyright © Intelligent Business Strategies 1992-2016! Spark In-Memory Analytic Applications Can Do A Lot More Than Map Reduce Processing  Keep only one copy in memory in a JVM  Track lineage of job operators used to derive the data  Use the lineage to re-compute the data if there is a failure  No MapReduce execution needed • Just Spark APIs map map join filter reduce Source: Amplab Spark application HDFSfile file file file file file Spark Applications / BI Tools Spark Core Spark Streaming R Spark SQL + DataFrames GraphX (Graph Computation) MLlib (Machine Learning) SQL Python Scala Java
  • 26. 26 Copyright © Intelligent Business Strategies 1992-2016! Spark Applications Operate On RDDs (Data) – You Can Do A Lot More Than Map and Reduce  RDD = Resilient Distributed Datasets  An RDD is a read-only, partitioned collection of records  RDDs can be only created through operators on either 1. A dataset in stable storage or 2. Other existing RDDs. Map Reduce Sample Filter Count Take Groupby Fold First Sort Reducebykey Partitionby Union groupByKey Mapwith Join Cogroup Mapwith Leftouterjoin Cross Pipe Rightouterjoin Zip Save Spark Operators Spark Applications
  • 27. 27 Copyright © Intelligent Business Strategies 1992-2016! Simplifying Access To Data Using Via SparkSQL and Spark DataFrames  A DataFrame is a distributed collection of data organized into named columns  Conceptually equivalent to a relational DBMS table or a data frame in R/Python  DataFrames can be constructed from a wide array of sources: • Structured data files • Hive tables • External databases • Existing RDDs  Uses schema on read Image source: Databricks.com Note: that Spark data sources can be relational & NoSQL DBMSs
  • 28. 28 Copyright © Intelligent Business Strategies 1992-2016! Spark Is Going Over The Top of Multiple Data Stores For Scalable In-Memory Analytics Across The Entire Ecosystem Streaming data Hadoop data store Data Warehouse RDBMS NoSQL DBMS EDW DW & martsAdvanced Analytic (multi-structured data) mart Operational NoSQL Data Stores Streaming analytics e.g. Casandra, Basho Riak Applications / BI Tools Spark Core Spark Streaming R Spark SQL + DataFrames GraphX (Graph Computation) MLlib (Machine Learning) SQL Python Scala Java
  • 29. 29 Copyright © Intelligent Business Strategies 1992-2016! Topics – Where Are We?  The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 30. 30 Copyright © Intelligent Business Strategies 1992-2016! Key Business Drivers And Objectives For Operational Analytics  Combine operational and analytical processing at scale to: • Improve customer engagement • Reduce risk • Avoid unplanned operational cost • Optimise operational effectiveness  Use BI/Analytics to drive and guide business operations to help achieve specific target business goals and KPI targets Automated analysis of operational events as they happen Automated alerts On-demand recommendations  Integrate BI/Analytics into every business process to: • Create a ‘insight driven’ employee base • Enable mass execution of business strategy via facilitating mass contribution towards achieve specific business goals
  • 31. 31 Copyright © Intelligent Business Strategies 1992-2016! Five Types Of Operational BI/Analytics 1. Simple operational reporting of current position/state e.g. session state 2. Situational awareness via visualisation of live operational data typically on dashboards 3. On-demand analytics of live operational and/or historical data to improve operational decisions and effectiveness 4. On-demand recommendations for guidance 5. Event stream processing to monitor, automatically analyse and act on events in real-time to prevent problems arising and to optimise business operations
  • 32. 32 Copyright © Intelligent Business Strategies 1992-2016! BI/ Analytics Apps / Services Operational Analytics – What’s The Difference Between On-Demand Vs Event-Driven Analysis? BI/ Analytics Services Application On-Demand Analytical service (query, report, model, recommendation) Message, file arrival, pattern, trigger Event-Driven Analytical service (query, report, model, recommendation) streaming data
  • 33. 33 Copyright © Intelligent Business Strategies 1992-2016! Analytics Need To Be Integrated Into Business Processes To Optimize Business Operations Customers Partners & suppliers Customer relationship management Operations management Supply chain management Marketing Sales Service/support Operations Finance/accounting Procurement Inventorycontrol Shipping/distribution Humanresources Employees Integrated Intelligent Business Operations Integrated On-Demand Business Intelligence
  • 34. 34 Copyright © Intelligent Business Strategies 1992-2016! High Value Application Use Cases for Streaming Analytics Streaming Analytics Source: Adapted from a slide by IBM
  • 35. 35 Copyright © Intelligent Business Strategies 1992-2016! Responding To Events And Event Patterns Means Reducing Action Time The time between an event occurring and action being taken being as close to zero as possible Action distance or action time Event- driven data integration Automated analysis Automated decision and action taking Source: Dr Richard Hackathorne
  • 36. 36 Copyright © Intelligent Business Strategies 1992-2016! With Event Stream Processing The Architecture Has To Change Data cleansing & integration Store data Query/Analyze (human) Store data Query/Analyze (automated) Classic Use of Analytics Event / Stream processing Act (automated or human) Data cleansing & integration
  • 37. 37 Copyright © Intelligent Business Strategies 1992-2016! Time Series Analysis – Query Processing Uses a Time Window to Look at Continuously Streaming Data Time Window T1 T2 E.g. 5 seconds or 30 seconds or 5 minutes Pattern/correlation Continuous time series queries (CQs) operate on the data as it flows by Stream processin g server CQs A set of queries (continuous queries) reside in the data stream server to process incoming data Data is pushed into the queries High frequency data
  • 38. 38 Copyright © Intelligent Business Strategies 1992-2016! Key Requirements For Operational Analytics  On-demand, event-driven and scheduled invocation of analytics  Monitor streaming events as they happen via automatic analysis  Automatic analysis via predictive and statistical models  Automatic interpretation of predictive/statistical model outcomes  Rule-driven automatic actions to automate decision making • E.g. Alerts, recommendations, transaction and process invocation  Integrate operational analytics into operational applications  Operational reporting  Scale to support large numbers of events and concurrent users  Store relevant data together to speed up analytics execution  Run predictive and statistical models close to the data  Run analytics on a 24x365 basis
  • 39. 39 Copyright © Intelligent Business Strategies 1992-2016! The Importance of In Memory Processing  Massively parallel in-memory processing is mission critical for scalable operational systems and operational analytics  Why? • Performance is a critical • Large number of concurrent user requests for on-demand analytics • Large number of concurrent application requests for on-demand analytics • Event driven operational analytics on very high velocity data needs memory
  • 40. 40 Copyright © Intelligent Business Strategies 1992-2016! Topics – Where Are We?  The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark  The value of operational analytics Operational analytics using The Basho Data Platform and Apache Spark  Conclusions
  • 41. 41 Copyright © Intelligent Business Strategies 1992-2016! The Basho Data Platform SERVICE INSTANCES STORAGE INSTANCES Solr Spark Redis (Caching) Solr Elastic Search Web Services 3rd Party Web Services & Integrations Riak!KV! !Key/Value Riak S2 ! Object Storage Riak TS !! Time!Series! Document Store Columnar Graph Replicate & Synchronize Message Routing Cluster Management & Monitoring Logging & Analytics Internal Data Store CORE SERVICES BASHO DEVELOPED BASHO INTEGRATED THE!BASHO!DATA!PLATFORM! Source:(Basho( hash partitioning, cluster scalability, triple replication, multi-datacentre replication co-locates time-series data, high availability, scalability replicates and synchronises data within and across Riak KV, Redis and Spark Clusters Automated cluster management simplifies administration Integrated in-memory caching for faster application performance Search based query processing on Riak data using Solr indexes Integrated in-memory analytics for Riak KV and Riak TS data
  • 42. 42 Copyright © Intelligent Business Strategies 1992-2016! Riak TS Is A New Basho Storage Instance Optimised for Time Series Data And Analytics  A distributed NoSQL database optimised for time series sequenced, unstructured data capture, aggregation and analysis from the Internet of Things (IoT)  Highly availability  Scalability - add nodes to a cluster without sharding  Automated and uniform data distributed across the cluster • Time of geohash based data co-location to ensure time series data is located on the same node  Data validation on input  APIs and client libraries for Java, Ruby, Python, Go, Erlang, Node.js or .NET.  Spark integration for operational analysis of time series data.
  • 43. 43 Copyright © Intelligent Business Strategies 1992-2016! Operational Analytics Using The Basho Data Platform And Apache Spark Opera&onal* analy&cs** web*service* Opera&onal* analy&c** applica&on* BI*Tool* data data data hash*par&&oned*data* Scalable* opera&onal applica&on* Spark**Core* Spark* Stream <ing* BlinkDB* Spark* SQL* GraphX* SparkR*MLlib* write*back* Opera&onal*Analy&cs*Using*The*Basho*Data*PlaHorm* recent data
  • 44. 44 Copyright © Intelligent Business Strategies 1992-2016! Operational Analytics Using The Basho Data Platform And Apache Spark - 2 • Can develop Spark operational analytic applications on low latency data stored in Basho Riak KV • Spark-based analytical web services can be invoked on- demand to analyse data in Riak KV • Use on-demand Spark jobs for historical analysis and predictions • Insights produced from analysing Riak KV data in can be written back to Riak KV for use by other applications • A form of closed-loop processing • Spark Streaming can be used to calculate rollups and detect abnormalities on streaming sensor data • Recent data can be kept in Redis for dashboard visualization
  • 45. 46 Copyright © Intelligent Business Strategies 1992-2016! Topics – Where Are We?  The changing landscape of operational and analytical systems  Scalable operational applications and NoSQL data stores  Big data analytics – The era of Hadoop and Spark  The value of operational analytics  Operational analytics using The Basho Data Platform and Apache Spark Conclusions
  • 46. 47 Copyright © Intelligent Business Strategies 1992-2016! Conclusions  As operational application processing scales, so too does the need to scale operational analytics  Basho is using in-memory processing to accelerate operational applications (via Redis) and to introduce scalable operational analytics (via Spark) into these applications  New scalable ‘smart’ operational applications are therefore becoming possible with careful design in a NoSQL environment
  • 47. 48 Copyright © Intelligent Business Strategies 1992-2016! www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700 Thank You!