SlideShare uma empresa Scribd logo
1 de 27
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data Pilot:
Building a Production Data Infrastructure
StampedeCon
29 May 2014, St. Louis
Stephen O’Sullivan (@steveos)
strata.svds.com @SVDataScience
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
2
Stephen O’Sullivan
Distinguished Architect
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data
Pilot:
Building a Production Data
Infrastructure
Creating a data architecture involves many moving parts. By
examining the data value chain, from ingestion through to analytics,
we will explain how the various parts of the Hadoop and big data
ecosystem fit together to support batch, interactive and realtime
analytical workloads.
By tracing the flow of data from source to output, we’ll explore the
options and considerations for components, including data
acquisition, ingestion, storage, data services, analytics and data
management. Most importantly, we’ll leave you with a framework for
understanding these options and making choices.
3
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
4
Key-Value
Columnar
Graph
Document
GENERAL
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
5
UP OR OUT? Different use cases put different
demands on the data
infrastructure
• UC1
• UC2
• UC3
• UC4
• UCn
Increasing cost per unit of
capability from scale-up
architectures causes rationing of
resources. Only the most valuable
use cases are pursued.
Data Resource Usage
Value
scale-out
cost
UC 1 UC2 UC3 UC4
UCn
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
6
THE DATA VALUE CHAIN
Acquire Ingest Process Persist Integrate Analyze Expose
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
7
BUILDING A
DATA
PLATFORM
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
8
Acquisition:
from internal and external data sources
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
9
Ingestion
offline and real-time Processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
10
Persistence
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
11
Data Services
Exposing data to applications
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
12
Analytics
batch and real-time processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Data
Management
Data security, operations,
lineage, quality, and metadata
management
13
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Use Case • Collection in-store sales
transactions in near real-time
• Provide near real-time
dashboards of sales transaction
(roll up by store, region etc)
• Provide ad-hoc access to this
data as soon as its collected (ie
low latency, and fine grain)
14
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
15
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
BI Server
http
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
16
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
CFS
BI Server
http
CFS
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Acquisition
– Make sure you have the correct network
access in place
– Will depend on the data and your policies.
• Data Ingestion
– Make sure the solution you choose can
scale out. Apache flume is a good example
of this
– Make sure your not point to point. In Flume,
Storm, and Kafka you can configure forks
etc But you may need to handle duplicate
data
Ready to go into Production?
• Data Acquisition
– Can you see the “collectors” (internal or
external)?
– Do you need to encrypt the data (internally
or externally)?
• Data Ingestion
– Can you handle the traffic to the “collectors”?
– Redundant / self healing paths into the
cluster?
17
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Repository
– Make sure you have a way to address it, as
it will happen
– Hadoop and Cassandra makes it very easy
to add nodes. If you cannot add nodes be
prepared to drop data or stop processes or
both
– If’s its very wide data, and you query a
subset of the columns, Parquet would be a
good choice. If you would like to be able to
version your data schema, Avro is a good
choice.
• Data Services
– Build a restful service to access the data
– What is data resiliency I hear you ask..
Ready to go into Production?
• Data Repository
– Can you handle out of order data?
– Can you scale the cluster for data volume
spikes and/or processing spikes ?
– Should I just store plan text (compressed)?
• Data Services
– Do applications need to access this data?
– Do you have data resiliency?
18
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
19
Stovepipe:
One-to-one
relationship
from data
source to
product
Hard Failure:
If the data
source is
broken, so is
the app.
Multi-sourced:
Redundancy of
overlapping data
sources makes your
products more
resilient
Graceful Degradation:
If a data source
breaks, there is a
backup and your app
continues to function
Production data services
abstract the probabilistic
integration of overlapping
data sources. We call this
model a Data Mesh:
DATA RESILIENCY Products
Data
Sources
Broken
Data
Sources
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Analytics
– There are a few to choose from Hive,
Impala, Spark SQL, and HAWQ (and
growing). Share the same meta store.
Some are faster than others (depends on the
type of query)
– See if the current tool works with your distro.
You can also look at Platfora, Datameer, and
Karmasphere
– Yes you do, but the benefit is you still have
access to the raw data for the advance data
analyst or data science
– Now you have a data lake you can take
advantage of doing deep analytics on the
data without moving it out.
Ready to go into Production?
• Analytics
– Which is the right SQL on Hadoop solution
(for me)?
– Which BI tool should I use?
– Do I still need to set up business views of
the data?
– What about deep analytics?
20
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
21
Analytics
tools
Analytics
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Management
– It’s getting there.. At the query level using
Hive or Impala you can use Apache Sentry
or Apache Knox
– There are other 3rd party tools like Dataguise
that lets you do things like encryption at rest,
or masking
– Using “Fair Scheduler” will help you manage
your jobs SLA’s
– A 3rd party product by Pepper Data can help
with this too (and a little more)
Ready to go into Production?
• Data Management
– Security (who can see what?)
– Can you meet your SLA when other jobs /
queries are running?
– What monitoring do you have in place?
– Cluster failover?
22
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Adding more
use cases
• I’m I duplicating data?
• Can I reuse the infrastructure
I’ve already created?
• Do have enough room in the
cluster (space/processing)?
• Will I impact the SLA’s of
jobs/queries currently running?
23
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
24
HIGH LEVEL
ARCHITECTURE
Oracle Stats
Collection
Pulling data over jdbc
Sending data to
Graphite Writing data to HDFS
Oracle Stats
CollectionOracle Stats
Collection
FORTUNE 500
RETAIL COMPANY
Enabling Real Time
Database Monitoring
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
25
FORTUNE 500
RETAIL COMPANY
Enabling Log
Collection & Search
statsd
http
APPLICATION SERVERS
Log Search
http
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
questions
26
Yes, We’re Hiring
svds.com/join-us
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
THANK YOU
Stephen O’Sullivan @steveos
27

Mais conteúdo relacionado

Mais procurados

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworksHortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceTony Baer
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyDataWorks Summit
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHortonworks
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...DataWorks Summit/Hadoop Summit
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Hortonworks
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?Hortonworks
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Cloudera, Inc.
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOADemed L'Her
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big dataDr. Wilfred Lin (Ph.D.)
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineCloudera, Inc.
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 

Mais procurados (20)

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 

Destaque

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Modelnoahwong
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Modelanusharaju38
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsAki Balogh
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model DATUM LLC
 

Destaque (13)

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Model
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Unicom Big Data Conference
Unicom  Big Data ConferenceUnicom  Big Data Conference
Unicom Big Data Conference
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Model
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 

Semelhante a Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationInside Analysis
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data LakesKiran Kamreddy
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Inside Analysis
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsJeffrey T. Pollock
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationInside Analysis
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HPMITEF México
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 

Semelhante a Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014 (20)

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Mais de StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

Mais de StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

  • 1. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure StampedeCon 29 May 2014, St. Louis Stephen O’Sullivan (@steveos) strata.svds.com @SVDataScience
  • 2. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 2 Stephen O’Sullivan Distinguished Architect
  • 3. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads. By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including data acquisition, ingestion, storage, data services, analytics and data management. Most importantly, we’ll leave you with a framework for understanding these options and making choices. 3
  • 4. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 4 Key-Value Columnar Graph Document GENERAL
  • 5. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 5 UP OR OUT? Different use cases put different demands on the data infrastructure • UC1 • UC2 • UC3 • UC4 • UCn Increasing cost per unit of capability from scale-up architectures causes rationing of resources. Only the most valuable use cases are pursued. Data Resource Usage Value scale-out cost UC 1 UC2 UC3 UC4 UCn
  • 6. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 6 THE DATA VALUE CHAIN Acquire Ingest Process Persist Integrate Analyze Expose
  • 7. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 7 BUILDING A DATA PLATFORM External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 8. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 8 Acquisition: from internal and external data sources External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 9. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 9 Ingestion offline and real-time Processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 10. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 10 Persistence External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 11. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 11 Data Services Exposing data to applications External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 12. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 12 Analytics batch and real-time processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 13. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Data Management Data security, operations, lineage, quality, and metadata management 13 External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 14. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Use Case • Collection in-store sales transactions in near real-time • Provide near real-time dashboards of sales transaction (roll up by store, region etc) • Provide ad-hoc access to this data as soon as its collected (ie low latency, and fine grain) 14
  • 15. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 15 APPLICATION SERVERS DATA CENTER A DATA CENTER B BI Server http BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 16. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 16 APPLICATION SERVERS DATA CENTER A DATA CENTER B CFS BI Server http CFS BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 17. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Acquisition – Make sure you have the correct network access in place – Will depend on the data and your policies. • Data Ingestion – Make sure the solution you choose can scale out. Apache flume is a good example of this – Make sure your not point to point. In Flume, Storm, and Kafka you can configure forks etc But you may need to handle duplicate data Ready to go into Production? • Data Acquisition – Can you see the “collectors” (internal or external)? – Do you need to encrypt the data (internally or externally)? • Data Ingestion – Can you handle the traffic to the “collectors”? – Redundant / self healing paths into the cluster? 17
  • 18. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Repository – Make sure you have a way to address it, as it will happen – Hadoop and Cassandra makes it very easy to add nodes. If you cannot add nodes be prepared to drop data or stop processes or both – If’s its very wide data, and you query a subset of the columns, Parquet would be a good choice. If you would like to be able to version your data schema, Avro is a good choice. • Data Services – Build a restful service to access the data – What is data resiliency I hear you ask.. Ready to go into Production? • Data Repository – Can you handle out of order data? – Can you scale the cluster for data volume spikes and/or processing spikes ? – Should I just store plan text (compressed)? • Data Services – Do applications need to access this data? – Do you have data resiliency? 18
  • 19. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 19 Stovepipe: One-to-one relationship from data source to product Hard Failure: If the data source is broken, so is the app. Multi-sourced: Redundancy of overlapping data sources makes your products more resilient Graceful Degradation: If a data source breaks, there is a backup and your app continues to function Production data services abstract the probabilistic integration of overlapping data sources. We call this model a Data Mesh: DATA RESILIENCY Products Data Sources Broken Data Sources Data Services
  • 20. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Analytics – There are a few to choose from Hive, Impala, Spark SQL, and HAWQ (and growing). Share the same meta store. Some are faster than others (depends on the type of query) – See if the current tool works with your distro. You can also look at Platfora, Datameer, and Karmasphere – Yes you do, but the benefit is you still have access to the raw data for the advance data analyst or data science – Now you have a data lake you can take advantage of doing deep analytics on the data without moving it out. Ready to go into Production? • Analytics – Which is the right SQL on Hadoop solution (for me)? – Which BI tool should I use? – Do I still need to set up business views of the data? – What about deep analytics? 20
  • 21. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 21 Analytics tools Analytics
  • 22. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Management – It’s getting there.. At the query level using Hive or Impala you can use Apache Sentry or Apache Knox – There are other 3rd party tools like Dataguise that lets you do things like encryption at rest, or masking – Using “Fair Scheduler” will help you manage your jobs SLA’s – A 3rd party product by Pepper Data can help with this too (and a little more) Ready to go into Production? • Data Management – Security (who can see what?) – Can you meet your SLA when other jobs / queries are running? – What monitoring do you have in place? – Cluster failover? 22
  • 23. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Adding more use cases • I’m I duplicating data? • Can I reuse the infrastructure I’ve already created? • Do have enough room in the cluster (space/processing)? • Will I impact the SLA’s of jobs/queries currently running? 23
  • 24. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 24 HIGH LEVEL ARCHITECTURE Oracle Stats Collection Pulling data over jdbc Sending data to Graphite Writing data to HDFS Oracle Stats CollectionOracle Stats Collection FORTUNE 500 RETAIL COMPANY Enabling Real Time Database Monitoring
  • 25. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 25 FORTUNE 500 RETAIL COMPANY Enabling Log Collection & Search statsd http APPLICATION SERVERS Log Search http
  • 26. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience questions 26 Yes, We’re Hiring svds.com/join-us
  • 27. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience THANK YOU Stephen O’Sullivan @steveos 27