SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Hadoop meets Mature BI:
Where the rubber meets the road for
Data Scientists
Michael Hiskey
Futurist, + Product Evangelist
VP, Marketing & Business Development
Kognitio
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
The Data Scientist
Sexiest job of the 21st Century?
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Key Concept: Graduation
Projects will need
to Graduate
from the
Data Science Lab
and become part
of
Business as Usual
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Demand for the Data Scientist
Organizational appetite for tens, not hundreds
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Don’t be a Railroad Stoker!
Highly skilled engineering required …
but the world innovated around them.
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Business Intelligence
Numbers
Tables
Charts
Indicators
Time
- History
- Lag
Access
- to view (portal)
- to data
- to depth
- Control/Secure
Consumption
- digestion
…with ease and simplicity
Straddle IT and Business
Faster
Lower latency
More granularity
Richer data model
Self service
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
What has changed?
More
connected-users?
More-connected
users?
According to one
estimate, mankind
created 150 exabytes
of data in 2005
(billion gigabytes)
In 2010 this was
1,200 exabytes
Data flow
@Kognitio @mphnyc #OANYC
Data Variety
@Kognitio @mphnyc #OANYC
Respondents were asked to choose up to two descriptions about how their organizations view big data from the choices above. Choices have been
abbreviated, and selections have been normalized to equal 100%. n=1144
Source: IBM Institute for Business Value/Said Business School Survey
What?
New value comes from your existing data
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
© 20th Century Fox
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Hadoop ticks many but not all the boxes
a
aaaaaaa
aa a aa
aa aa a
a aa a
aa aaa
a aa aa
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
 No need to pre-process
 No need to align to schema
 No need to triage
Null storage concerns
@Kognitio @mphnyc #OANYC
Machine learning
algorithms Dynamic
Simulation
Statistical
Analysis
Clustering
Behaviour
modelling
The drive for deeper understanding
Reporting & BPM
Fraud detection
Dynamic
Interaction
Technology/Automation
AnalyticalComplexity
Campaign
Management
#MPP_R
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Hadoop just too
slow for interactive
BI!
…loss of train-
of-thought
“while hadoop shines as a proc
platform, it is painfully slow as
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
Analytics needs
low latency, no I/O wait
High speed in-memory processing
Analytical Platform: Reference Architecture
Analytical
Platform
Layer
Near-line
Storage
(optional)
Application &
Client Layer
All BI Tools All OLAP Clients Excel
Persistence
Layer Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
…
Reporting
Cloud
Storage
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC
The Future
Big DataAdvanced Analytics
In-memory
Logical Data Warehouse
Predictive Analytics
Data Scientists
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
NA: +1 855 KOGNITIO
EMEA: +44 1344 300 770
@Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #MPP_R
Hadoop meets Mature BI:
Where the rubber meets the road for
Data Scientists
• The key challenge for Data Scientists is not the proliferation of their
roles, but the ability to ‘graduate’ key Big Data projects from the
‘Data Science Lab’ and production-ize them into their broader
organizations.
• Over the next 18 months, "Big Data' will become just "Data"; this
means everyone (even business users) will need to have a way to
use it - without reinventing the way they interact with their current
reporting and analysis.
• To do this requires interactive analysis with existing tools and
massively parallel code execution, tightly integrated with Hadoop.
Your Data Warehouse is dying; Hadoop will elicit a material shift
away from price per TB in persistent data storage.
The new bounty hunters:
Drill
Impala
Pivotal
Stinger
The No SQL Posse
Wanted
Dead or Alive
SQL
It’s all about getting work done
Used to be simple fetch of value
Tasks evolving:
Then was calc dynamic aggregate
Now complex algorithms!
@Kognitio @mphnyc #MPP_R
create external script LM_PRODUCT_FORECAST environment rsint
receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES
partition by PRODNO order by PRODNO, ROW_ID
sends ( R_OUTPUT varchar )
isolate partitions
script S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.names
colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")
dim1<-dim(prod1)
daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW),
daily1[,2]<-daily1[,2]/sum(daily1[,2])
basesales<-array(0,c(dim1[1],2))
basesales[,1]<-prod1$ID
basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])
colnames(basesales)<-c("ID","BASESALES")
fit1=lm(BASESALES ~ ID,as.data.frame(basesales))
select Trans_Year, Num_Trans,
count(distinct Account_ID) Num_Accts,
sum(count( distinct Account_ID)) over (partition by Trans_Year
cast(sum(total_spend)/1000 as int) Total_Spend,
cast(sum(total_spend)/1000 as int) / count(distinct Account_ID
rank() over (partition by Trans_Year order by count(distinct A
rank() over (partition by Trans_Year order by sum(total_spend)
from( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,
count(Transaction_ID) Num_Trans,
select dept, sum(sales)
from sales_fact
Where period between date ‘01-05-2006’ and date ‘31-05-2006’
group by dept
having sum(sales) > 50000;
select sum(sales)
from sales_history
where year = 2006 and month = 5 and region=1;
select total_sales
from summary
where year = 2006 and month = 5 and region=1;
Behind the
numbers
@Kognitio @mphnyc #MPP_R
For once technology is on our side
First time we have full triumvirate of
– Excellent Computing power
– Unlimited storage
– Fast Networks
…now that RAM is cheap!
@Kognitio @mphnyc #MPP_R
Lots of these
Not so many of these
Hadoop is…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk

Mais conteúdo relacionado

Mais procurados

Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
 
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiHow Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scaleLooker
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Databricks
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit
 
Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudDataWorks Summit
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureBig Data Spain
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCASri Ambati
 
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Databricks
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 

Mais procurados (20)

Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiHow Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraud
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCA
 
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 

Destaque

DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormMike Lohmann
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Destaque (11)

angular2-learn
angular2-learnangular2-learn
angular2-learn
 
Oas schwartz 16
Oas schwartz 16Oas schwartz 16
Oas schwartz 16
 
AmazonRedshift
AmazonRedshiftAmazonRedshift
AmazonRedshift
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

Semelhante a Big data bi-mature-oanyc summit

Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Databricks
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Neo4j GraphDay Seattle- Sept19- graphs are ai
Neo4j GraphDay Seattle- Sept19-  graphs are aiNeo4j GraphDay Seattle- Sept19-  graphs are ai
Neo4j GraphDay Seattle- Sept19- graphs are aiNeo4j
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienITCamp
 
Planning for the unplannable - VUCA
Planning for the unplannable - VUCAPlanning for the unplannable - VUCA
Planning for the unplannable - VUCAJoakim Lindbom
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondJames Huang
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Sri Ambati
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninInside Analysis
 
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...PAPIs.io
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2ISSIP
 
Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesGanesh Raju
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineScyllaDB
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapOpenPOWERorg
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure mlKoray Kocabas
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBigDataExpo
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 

Semelhante a Big data bi-mature-oanyc summit (20)

Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Neo4j GraphDay Seattle- Sept19- graphs are ai
Neo4j GraphDay Seattle- Sept19-  graphs are aiNeo4j GraphDay Seattle- Sept19-  graphs are ai
Neo4j GraphDay Seattle- Sept19- graphs are ai
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines Kergosien
 
Planning for the unplannable - VUCA
Planning for the unplannable - VUCAPlanning for the unplannable - VUCA
Planning for the unplannable - VUCA
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
 
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
From R&D to production-ready predictive apps - Christophe Bourguignat & Yann ...
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2
 
Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and Opportunities
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 Recap
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big Data
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 

Mais de Open Analytics

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationOpen Analytics
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsOpen Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital EconomyOpen Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Open Analytics
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupOpen Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupOpen Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalOpen Analytics
 

Mais de Open Analytics (20)

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & Personalization
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
 

Último

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 

Último (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 

Big data bi-mature-oanyc summit

  • 1. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Hadoop meets Mature BI: Where the rubber meets the road for Data Scientists Michael Hiskey Futurist, + Product Evangelist VP, Marketing & Business Development Kognitio
  • 2. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC The Data Scientist Sexiest job of the 21st Century?
  • 3. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Key Concept: Graduation Projects will need to Graduate from the Data Science Lab and become part of Business as Usual
  • 4. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Demand for the Data Scientist Organizational appetite for tens, not hundreds
  • 5. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Don’t be a Railroad Stoker! Highly skilled engineering required … but the world innovated around them.
  • 6. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Business Intelligence Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with ease and simplicity Straddle IT and Business Faster Lower latency More granularity Richer data model Self service
  • 7. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC What has changed? More connected-users? More-connected users?
  • 8. According to one estimate, mankind created 150 exabytes of data in 2005 (billion gigabytes) In 2010 this was 1,200 exabytes
  • 11. @Kognitio @mphnyc #OANYC Respondents were asked to choose up to two descriptions about how their organizations view big data from the choices above. Choices have been abbreviated, and selections have been normalized to equal 100%. n=1144 Source: IBM Institute for Business Value/Said Business School Survey What? New value comes from your existing data
  • 12. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC © 20th Century Fox
  • 13. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Hadoop ticks many but not all the boxes a aaaaaaa aa a aa aa aa a a aa a aa aaa a aa aa
  • 14. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC  No need to pre-process  No need to align to schema  No need to triage Null storage concerns
  • 15. @Kognitio @mphnyc #OANYC Machine learning algorithms Dynamic Simulation Statistical Analysis Clustering Behaviour modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic Interaction Technology/Automation AnalyticalComplexity Campaign Management #MPP_R
  • 16. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Hadoop just too slow for interactive BI! …loss of train- of-thought “while hadoop shines as a proc platform, it is painfully slow as
  • 17. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC Analytics needs low latency, no I/O wait High speed in-memory processing
  • 18. Analytical Platform: Reference Architecture Analytical Platform Layer Near-line Storage (optional) Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems … Reporting Cloud Storage
  • 19. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #OANYC The Future Big DataAdvanced Analytics In-memory Logical Data Warehouse Predictive Analytics Data Scientists
  • 21. @Kognitio @mphnyc #MPP_R@Kognitio @mphnyc #MPP_R Hadoop meets Mature BI: Where the rubber meets the road for Data Scientists • The key challenge for Data Scientists is not the proliferation of their roles, but the ability to ‘graduate’ key Big Data projects from the ‘Data Science Lab’ and production-ize them into their broader organizations. • Over the next 18 months, "Big Data' will become just "Data"; this means everyone (even business users) will need to have a way to use it - without reinventing the way they interact with their current reporting and analysis. • To do this requires interactive analysis with existing tools and massively parallel code execution, tightly integrated with Hadoop. Your Data Warehouse is dying; Hadoop will elicit a material shift away from price per TB in persistent data storage.
  • 22. The new bounty hunters: Drill Impala Pivotal Stinger The No SQL Posse Wanted Dead or Alive SQL
  • 23. It’s all about getting work done Used to be simple fetch of value Tasks evolving: Then was calc dynamic aggregate Now complex algorithms!
  • 24. @Kognitio @mphnyc #MPP_R create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID rank() over (partition by Trans_Year order by count(distinct A rank() over (partition by Trans_Year order by sum(total_spend) from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; Behind the numbers
  • 25. @Kognitio @mphnyc #MPP_R For once technology is on our side First time we have full triumvirate of – Excellent Computing power – Unlimited storage – Fast Networks …now that RAM is cheap!
  • 26. @Kognitio @mphnyc #MPP_R Lots of these Not so many of these Hadoop is… Hadoop inherently disk oriented Typically low ratio of CPU to Disk

Notas do Editor

  1. Language – one word changes and the whole meaning shifts one word like Hadoop is creating seismic shifts in the world of data and its useCore themes: History – cycle of discover-construct-discover-construct Humans building EDW has had its time but will represent learning and museum for future Kognitio is new bridge between business need and the information stores of the EDW and Hadoop
  2. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.http://www.guardian.co.uk/news/datablog/2012/mar/02/data-scientist#zoomed-picture
  3. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.http://www.guardian.co.uk/news/datablog/2012/mar/02/data-scientist#zoomed-picture
  4. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.http://www.guardian.co.uk/news/datablog/2012/mar/02/data-scientist#zoomed-picture
  5. Very crudely
  6. Can play the hyphen game more connected-users more-connected usersInformation consumers – your users or you the user?Creates more pressure on the infrastructure – more queries
  7. Mobile access is coming alongApplication space broadening BYODCan supply access to BIBut also furiously generate data for BIAccess to dynamic information but every access generates data and possible inferencesSelf-service access
  8. IBM Institute for Business Value/Said Business School Survey Defining big dataMuch of the confusion about big data begins with thedefinition itself. To understand our study respondents’definition of the term, we asked each to select up to twocharacteristics of big data. Rather than any single characteristicclearly dominating among the choices, respondents weredivided in their views on whether big data is best describedby today’s greater volume of data, the new types of data andanalysis, or the emerging requirements for more real-timeinformation analysis (see Figure 1).
  9. EXPERIMENTING? So is this you, not quite sureOne CTO told me he invented a Hadoop project just to keep developers happy – it’s important for their own techie development that they have the experience! That may sound DAFT… hot think of Hadoop in 2013 like the Internet in 1996… if you are not far down that road you’re behind and need to catch-up. The community is growing and building, they will address many of the limitations we see today… SOMEDAY
  10. Hadoop is not &quot;universal solution“!Way too much hype and hyperbole - great for innovators and start-ups not so good for plain old business
  11. DW demanded ETL to map data into model and ensure logical consistency - upfront prerequisiteHadoop making people lazy – it cuts out thought but leaves future decisions wide open – no lock in, cuts risks of bad decisionsSimplified decisions of what to keep – keep it allBUT hey BI needs structure and discipline!!!!
  12. Bottlenecks caused by platforms and tools unable to cope with demands of complexity, disparity and volumeComplex analyticsMachine learning – fraud detection/gamingWeb Analytics – Dynamic content/bid managementModelling – traditional clustering/behavioural for marketing/product development/resource optimisationInvestigative Reporting (Dashboards and reports with granular data access)Data Model
  13. Hadoop – the false dawnThere, I have dared to say it!Does not accelerate BI in quite the same way asthe EDWUsers have had a decade of being sold train-of-thought - icubes and Visual Insight Hadoop - Not hands on, not desktop, not agile
  14. Lots of access to data - iterationsAnalytics is about work done – more work needs to be doneSo don’t hold CPUs back!In-memory is not cache!Memory is underplayed in Hadoop - its cheap use it!Processors and Ram are true measure of work that can be done – disks just fetchKeep data in memory!!! Don’t swap, don’t wait on disk don’t pick through indexes then data, just access what is needed.Economics of RAM have changed, much lower cost, large volumes readily available
  15. Ah yes plugging into Hadoop So much for noSQL revolutionUniversal integration needed – protect the BI investmentLost the gun fight like all revolutions the upstarts died down and got absorbed (subsumed)Business and BI investment demands SQL!Hive now we have drill, impala, Pivotal,Tough game – yes its SQL access but not low latency
  16. What the business cares about is getting work doneThey really don’t care about how it is stored or where it is stored!Its not about raw individual speed its about throughputAddress the bottlenecksToo many vendors play games that just shift the bottleneck
  17. BI mostly focuses (sells) on presentation – Graphics, pictures, VisualisationBUT behind the scenes a lot of heavy lifting has to be doneThis workload has changed over time from the simple to complex
  18. No need for single platforms like the traditional DW – stores and analysesThis is why data sciences risesWe did not get this in rise of data mining in the 90’sWe’ll come onto RAM shortly
  19. BATCH Hadoop disk centric – Storage - just like the EDW more parallelism yes, lots more but still batch disk I/O centricSchedulers not designed for rapid responseEssentially a batch queue – BI applications and business users have significantly evolved from batch reporting