SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
The Evolving Landscape of
Data Engineering
Bucharest Big Data Meetup @ TechHub
Andrei Savu / @andreisavu
Andrei Savu
Currently Staff Engineer @ Twitter:
* Twitter Ad Exchange Data Team
* Focus on Mobile Monetization
Co-organizer of the Data Engineering
Club in San Francisco.
Previously Tech Lead at Cloudera via
the Axemblr.com acquisition. Started
the Cloud engineering team.
One of the early founders of the
Bucharest Java User Group.
What is data engineering?
The Past / Drivers of innovation:
● OSS communities
● AWS history
● Google Cloud history
The Present: Common Patterns
The Future: Wish List
Where do I start?
Topics
What is data engineering? (vs. data science, vs. ML)
“Unlike data scientists — and inspired by
our more mature parent, software
engineering — data engineers build tools,
infrastructure, frameworks, and services. In fact,
it’s arguable that data engineering is much
closer to software engineering than it is to a data
science.”
Maxime Beauchemin
The Rise of the Data Engineer
Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS
Visionary Business
Fast iterations
Data Management as a
key platform use case
Incredible Scale
Transition to “serverless”
OPEX & Elastic
The Past - AWS
Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud
The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature and
continues to innovate.
We have a broad set of options.
Big Data is much bigger (e.g. x1e.32xlarge:
3TB mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Zone/DC locality is important (laws of physics)
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Cost will
continue to be a concerns even when scale is
not.
Where do I start?
Technologies:
● SQL + Python
● Pandas + Numpy
● Jupyter or Zeppelin
● Spark
Google Cloud:
● https://www.coursera.org/specializations/g
cp-data-machine-learning ($300 credit)
Domain Knowledge:
● Critical business questions
● The data needed to answer them
● Understand access patterns
Thanks! Questions?

Mais conteúdo relacionado

Mais procurados

Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
Shankar R
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Kelvin Lam
 

Mais procurados (20)

Grid
GridGrid
Grid
 
bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...
 
Cloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computingCloud computing 2 business perspective of cloud computing
Cloud computing 2 business perspective of cloud computing
 
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
Cloudant
CloudantCloudant
Cloudant
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output
 
Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for Repositories
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKSMULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
MULTILAYER BIG DATA ARCHITECTURE FOR REMOTE SENSING IN EOLIC PARKS
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
The world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AIThe world with Cloud, Big Data, ML, IoT and AI
The world with Cloud, Big Data, ML, IoT and AI
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
11
1111
11
 
10
1010
10
 

Semelhante a The Evolving Landscape of Data Engineering

Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
Nguyen Duong
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
Aman Ghei
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
SkylabReddy Vanga
 

Semelhante a The Evolving Landscape of Data Engineering (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...Reference Architectures for Layered CPS System of Systems using Data Hubs and...
Reference Architectures for Layered CPS System of Systems using Data Hubs and...
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Openstack
OpenstackOpenstack
Openstack
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
 
ppt2.pdf
ppt2.pdfppt2.pdf
ppt2.pdf
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .ppt
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 

Mais de Andrei Savu

Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
Andrei Savu
 

Mais de Andrei Savu (20)

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
 

Último

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 

The Evolving Landscape of Data Engineering

  • 1. The Evolving Landscape of Data Engineering Bucharest Big Data Meetup @ TechHub Andrei Savu / @andreisavu
  • 2. Andrei Savu Currently Staff Engineer @ Twitter: * Twitter Ad Exchange Data Team * Focus on Mobile Monetization Co-organizer of the Data Engineering Club in San Francisco. Previously Tech Lead at Cloudera via the Axemblr.com acquisition. Started the Cloud engineering team. One of the early founders of the Bucharest Java User Group.
  • 3. What is data engineering? The Past / Drivers of innovation: ● OSS communities ● AWS history ● Google Cloud history The Present: Common Patterns The Future: Wish List Where do I start? Topics
  • 4. What is data engineering? (vs. data science, vs. ML) “Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services. In fact, it’s arguable that data engineering is much closer to software engineering than it is to a data science.” Maxime Beauchemin The Rise of the Data Engineer
  • 5. Weeks of Provisioning Static Infrastructure Commodity Hardware Commodity Networking Data Locality Important Running in the Public Cloud was unusual CAPEX The Past - OSS
  • 6. Visionary Business Fast iterations Data Management as a key platform use case Incredible Scale Transition to “serverless” OPEX & Elastic The Past - AWS
  • 7. Visionary Products Fast iterations Machine Learning as a key use case State of the Art data platform Last 3 years on fast forward Intelligent Billing OPEX & Elastic The Past - Google Cloud
  • 8. The Present: Patterns Weeks to Minutes to Seconds Hadoop/Spark ecosystem is mature and continues to innovate. We have a broad set of options. Big Data is much bigger (e.g. x1e.32xlarge: 3TB mem, 128 vCPUs, 14Gbps network) Scale continues to be hard. Cloud economics can be very disruptive (especially for data workloads) High-performance networks are common. Storage can be decoupled from compute. Zone/DC locality is important (laws of physics) Service Endpoints (not clusters, aka serverless, aka managed etc.). Sophisticated Auto-scaling (batch & streaming, spot vs. on-demand, multi-az). Multi-DC and Multi-Region from Day 1.
  • 9. The Future: Wish List A Data Catalog product as the center of the universe. Data Monitoring Systems: * statistical properties, anomaly detection, schema changes, consumption patterns etc. More intelligence at the data infrastructure level: * data format migrations, intelligent caching based on access patterns. Declarative data transformation vs. explicit ETL. Intelligent data sampling products. Cost will continue to be a concerns even when scale is not.
  • 10. Where do I start? Technologies: ● SQL + Python ● Pandas + Numpy ● Jupyter or Zeppelin ● Spark Google Cloud: ● https://www.coursera.org/specializations/g cp-data-machine-learning ($300 credit) Domain Knowledge: ● Critical business questions ● The data needed to answer them ● Understand access patterns