SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
www.scling.com
Data engineering in 10 years
Lars Albertsson, Founder, Scling
2022-11-09
1
www.scling.com
Prediction of future?
Opinion + belief
2
Functional languages (Scala, Kotlin, …) are better
suited for data processing than Python. I believe that
they will be dominant in the future.
www.scling.com
How to predict the future?
● Promises
● Extrapolation
○ Leading to tipping points
3
www.scling.com
How to predict the future?
● Promises
● Extrapolation
○ Leading to tipping points
4
● Patterns
○ Similar contexts ahead in the journey
● Future is unevenly divided
○ Some are already there
www.scling.com
Vintage digital disruption - MRP
● Materials resource planning
○ What materials are needed for manufacturing (this month)
○ Computerised in the 80s
○ Expensive manual monthly → automatically overnight
● MRP hype
○ People → software
○ … that is executed each month
● C.f. adoption today
○ Cloud
○ Agile
○ Data
○ ML
5
www.scling.com
Technology adoption
Eliyahu M. Goldratt on adopting new technology:
"Technology can bring benefits if, and only if, it diminishes a limitation."
● What is the power of the technology?
● What limitation does it diminish?
● What rules helped us accommodate the limitation?
● What rules should we use now?
6
www.scling.com
Technology adoption
Eliyahu M. Goldratt on adopting new technology:
"Technology can bring benefits if, and only if, it diminishes a limitation."
● What is the power of the technology?
● What limitation does it diminish?
● What rules helped us accommodate the limitation?
● What rules should we use now?
Future = new technology - old rules + new rules
7
Primary cause of waste in
data value creation
www.scling.com
New rules?
● C.f. steam factory → electricity
○ Without new rules → backlash
● Scoped out
○ Covered yesterday
8
www.scling.com
What is the power of data engineering?
● Feasible to store all (raw) data
● Cheap (re)computations
● Build more complex data processing flows
● Share data across teams with minimal operational risk
● Fast experiment iteration and feedback with minimal operational risk
(Scoping out data science and machine learning.)
9
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
○ Each dataset has business value
● Proxy value/cost metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 100-1000
10
2014: 6500 datasets / day
2016: 20000 datasets / day
2018: 100000+ datasets / day,
25% of staff use BigQuery
2021: 500B events collected / day
2016: 1600 000 000
datasets / day
Disruptive value of data, machine learning
Financial, reporting
Insights, data-fed features
effort
value
www.scling.com
Data agility
11
● Siloed: 6+ months
Cultural work
● Autonomous: 1 month
Technical work
● Coordinated: days
Data lake
∆
∆
Latency?
www.scling.com
Enabling innovation
12
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
Manual, mechanised, industrialised
13
www.scling.com
IT craft to factory
14
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
15
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
100x 100x
Data artifacts produced
Manual, mechanised, industrialised
16
Spotify's pipelines ~2013
www.scling.com
Crafted artifacts: data models
17
● Data (warehouse) models are carefully crafted
○ Built with hand-crafted SQL
○ Primitive automation
○ Reproducible?
● Require careful modelling to avoid trouble
○ E.g. slowly changing dimensions
○ Data vault, star schemas, satellites, …
● Pets, not cattle
www.scling.com
Artisanal vs industrialised data modelling
Artisanal:
● Create single shared model artifact
● Used for many use cases
● Innovate fast model → use case
Industrial:
● Create model for each use case
● Reuse code that produces model
● Each model may be unique
● Innovate fast raw → model → use case
18
www.scling.com
Premature modelling is waste
● Power: Recompute model quickly
● Lifted limitation: Expensive to compute model
● Old rule: Careful manual modelling work
● New rules: Guard rails preventing model iteration from breaking downstream
○ Code QA = testing
○ Code + data QA = monitoring
Yes, on purpose!
19
www.scling.com
Artisanal vs industrialised knowledge graphs
Artisanal:
● Create single shared graph
● Used for many use cases
● Innovate fast graph → use case
Industrial:
● Create graph for each use case
● Reuse code that produces graph
● Each graph may be unique
● Innovate fast raw → graph → use case
20
www.scling.com
Artisanal vs industrialised machine learning models
Google MLOps maturity model:
● MLOps level 0: Manual process
● MLOps level 1: ML pipeline automation
● MLOps level 2: CI/CD pipeline automation
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
21
www.scling.com
Road towards industrialisation
22
Data warehouse age -
mechanised analytics
DW
LAMP stack age -
manual analytics
Hadoop age -
industrialised analytics,
data-fed features,
machine learning
Significant change in workflows
Early Hadoop:
● Weak indexing
● No transactions
● Weak security
● Batch transformations
www.scling.com
Simplifying use of new technology
23
DW
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
Low-code, no-code
www.scling.com
We have seen this before
24
Difficult adoption
4GL, UML, low-code, no-code
Software engineering education
www.scling.com
Data engineering in the future
25
DW
~10 year capability gap
"data factory engineering"
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
4GL / UML phase of data engineering
Data engineering education
www.scling.com
Low-code web creation works.
Future of low-code & no-code
26
Low-code application
development does not.
Low-code data?
www.scling.com
● Static content (mostly)
● Low complexity
● Simple QA
● Inbound data + user
defines content
● High complexity
● QA depends on
user + data
Future of low-code & no-code
27
● User defines content
● Medium complexity
● QA depends on user
behaviour
www.scling.com
SQL for data processing
● SQL used in 3 distinct contexts
○ Interactive exploration
○ Backend data record retrieval
○ ETL data processing?
28
Important data language features:
● Can express (complex) business logic
● Composability
● Reusability
● Testability
● Seamless integration with external logic
● Tools to guide towards good path
○ Type system
○ Inspection tools
● IDE experience
● Debuggability
● Data quality measurement support
● Data quality improvement support
● Learning curve
www.scling.com
SQL for data processing
● SQL used in 3 distinct contexts
○ Interactive exploration
○ Backend data record retrieval
○ ETL data processing?
29
Important data language features:
● Can express (complex) business logic
● Composability
● Reusability
● Testability
● Seamless integration with external logic
● Tools to guide towards good path
○ Type system
○ Inspection tools
● IDE experience
● Debuggability
● Data quality measurement support
● Data quality improvement support
● Learning curve
https://threadreaderapp.com/thread/1353832649664692225.html
www.scling.com
SQL inadequate for mature applications
● SQL from scratch - things seem ok
● Porting a mature application
○ Cannot reasonably express logic
○ ~5x slower (Hive 1.x)
○ Give up quality metrics
● Data quality measurements
● Data quality improvement
30
case class Order(item: ItemId, userId: UserId)
case class User(id: UserId, country: String)
val orders = read(orderPath)
val users = read(userPath)
val orderNoUserCounter = longAccumulator("order-no-user")
val joined: C[(Order, Option[User])] = orders
.groupBy(_.userId)
.leftJoin(users.groupBy(_.id))
.values
val orderWithUser: C[(Order, User)] = joined
.flatMap( orderUser match
case (order, Some(user)) => Some((order, user))
case (order, None) => {
orderNoUserCounter.add(1)
None
})
www.scling.com
Technology adoption & modern data stack
● New power:
Build more complex data processing flows
● Old limitation:
Brain capability to understand full flow
● Rules to mitigate limitation:
Declarative & low code languages
● New rules:
Software engineering / DevOps
31
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
32
www.scling.com
Data platform
Big data - a collaboration paradigm
33
Stream storage?
Data lake
Data
democratised
www.scling.com
Technology adoption & data lake collaboration
● New powers:
Share data across teams with minimal operational risk
Fast experiment iteration and feedback with minimal operational risk
● Old limitations:
Operational risk. Governance risk. Political.
● Rules to mitigate limitation:
Data isolated.
Internal API = technical contract
● New rules:
DataOps - holistic QA
New governance mechanisms
34
www.scling.com
Data platform
Data products / contracts = old rules, new context
35
Stream storage?
Data lake
Data contract
Data product
www.scling.com
Left is up
36
Winston W Royce:
"Managing the development
of large software systems"
www.scling.com
Extreme programming
37
www.scling.com
Agile
38
www.scling.com
DevOps
39
www.scling.com
Vintage team contracts and products
40
● Rational Unified Process
● Strong separation between
teams / developers
● Contracts at handoff points
● Maximum number of handoffs
in a value stream
www.scling.com
Big data
41
www.scling.com
DataOps
42
www.scling.com
MLOps
43
DATA
SCIENCE
www.scling.com
Which methodologies fade or prevail?
44
● Perpendicular to value stream
○ Barriers between people & teams
○ Extra non-value adding work
○ More handoffs
○ Homogeneous competence
● Waterfall
● RUP
● Data products / data mesh
● Data contracts
● Aligned along value stream
○ Few handoffs from raw to value
○ Enabled teams
○ Remove waste (in lean terms)
○ Heterogeneous competence
● Extreme programming / TDD
● Agile
● Big data
● DevOps
● DataOps
www.scling.com
Risk management by shifting left
● Manual governance
● Automated process
● DevOps:
○ Automated quality risk management
○ Quick feedback up in value stream
○ Left shifted QA risk management has
improved both speed and quality
● DataOps:
○ Contracts are automated tests
○ Inter-system protocols are
implementation details
○ New rule: Holistic QA
○ New governance
45
www.scling.com
Risk management by shifting left
● Manual governance
● Automated process
● DevOps:
○ Automated quality risk management
○ Quick feedback up in value stream
○ Left shifted QA risk management has
improved both speed and quality
● DataOps:
○ Contracts are automated tests
○ Inter-system protocols are
implementation details
○ New rule: Holistic QA
○ New governance
46
● DevSecOps
○ Security team approval
○ One-off vulnerability scans
○ Automated security rule validation
○ Feedback on change in vulnerabilities
● GovernanceOps?
○ Manual approval
○ Automated governance rule validation?
● ComplianceOps?
○ Manual one-off audits
○ Automated compliance inspections?
www.scling.com
DevSecOps
47
SECURITY
www.scling.com
ComplianceOps
48
COMPLIANCE
www.scling.com
Wrapup
49
● The future is faster
○ Patterns from other disciplines
○ How do leaders work?
○ Rules that hold us back today
● Look at software engineering evolution
○ Industrialised process eliminates
big design up front
○ Enabled, high code components
○ Stream-aligned teams
○ Shift left continues
www.scling.com
Wrapup
50
● The future is faster
○ Patterns from other disciplines
○ How do leaders work?
○ Rules that hold us back today
● Look at software engineering evolution
○ Industrialised process eliminates
big design up front
○ Enabled, high code components
○ Stream-aligned teams
○ Shift left continues
● Change is difficult, takes years
○ Agile transformations
○ DevOps transformations
● Current methods ineffective
○ Organically grow competence
○ Buy stuff
○ Consultants
● Belief: new collaboration methods
www.scling.com
Scling - data-factory-as-a-service
51
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration
www.scling.com
Tech has massive impact on society
52
Product?
Supplier?
Employer?
Make an active
choice whether to
have an impact!
Cloud?

Mais conteúdo relacionado

Mais procurados

Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptxVikRam S
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQueryPradeep Bhadani
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Indexes: The neglected performance all rounder
Indexes: The neglected performance all rounderIndexes: The neglected performance all rounder
Indexes: The neglected performance all rounderMarkus Winand
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company PresentationAndrewJiang18
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 

Mais procurados (20)

Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Indexes: The neglected performance all rounder
Indexes: The neglected performance all rounderIndexes: The neglected performance all rounder
Indexes: The neglected performance all rounder
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 

Semelhante a Data engineering in 10 years.pdf

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetLars Albertsson
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisisLars Albertsson
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyVenkata Pingali
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleItai Yaffe
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityLars Albertsson
 

Semelhante a Data engineering in 10 years.pdf (20)

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case Study
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 

Mais de Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processingLars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipelineLars Albertsson
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 

Mais de Lars Albertsson (19)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Data democratised
Data democratisedData democratised
Data democratised
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 

Último

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 

Último (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Data engineering in 10 years.pdf

  • 1. www.scling.com Data engineering in 10 years Lars Albertsson, Founder, Scling 2022-11-09 1
  • 2. www.scling.com Prediction of future? Opinion + belief 2 Functional languages (Scala, Kotlin, …) are better suited for data processing than Python. I believe that they will be dominant in the future.
  • 3. www.scling.com How to predict the future? ● Promises ● Extrapolation ○ Leading to tipping points 3
  • 4. www.scling.com How to predict the future? ● Promises ● Extrapolation ○ Leading to tipping points 4 ● Patterns ○ Similar contexts ahead in the journey ● Future is unevenly divided ○ Some are already there
  • 5. www.scling.com Vintage digital disruption - MRP ● Materials resource planning ○ What materials are needed for manufacturing (this month) ○ Computerised in the 80s ○ Expensive manual monthly → automatically overnight ● MRP hype ○ People → software ○ … that is executed each month ● C.f. adoption today ○ Cloud ○ Agile ○ Data ○ ML 5
  • 6. www.scling.com Technology adoption Eliyahu M. Goldratt on adopting new technology: "Technology can bring benefits if, and only if, it diminishes a limitation." ● What is the power of the technology? ● What limitation does it diminish? ● What rules helped us accommodate the limitation? ● What rules should we use now? 6
  • 7. www.scling.com Technology adoption Eliyahu M. Goldratt on adopting new technology: "Technology can bring benefits if, and only if, it diminishes a limitation." ● What is the power of the technology? ● What limitation does it diminish? ● What rules helped us accommodate the limitation? ● What rules should we use now? Future = new technology - old rules + new rules 7 Primary cause of waste in data value creation
  • 8. www.scling.com New rules? ● C.f. steam factory → electricity ○ Without new rules → backlash ● Scoped out ○ Covered yesterday 8
  • 9. www.scling.com What is the power of data engineering? ● Feasible to store all (raw) data ● Cheap (re)computations ● Build more complex data processing flows ● Share data across teams with minimal operational risk ● Fast experiment iteration and feedback with minimal operational risk (Scoping out data science and machine learning.) 9
  • 10. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ○ Each dataset has business value ● Proxy value/cost metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 100-1000 10 2014: 6500 datasets / day 2016: 20000 datasets / day 2018: 100000+ datasets / day, 25% of staff use BigQuery 2021: 500B events collected / day 2016: 1600 000 000 datasets / day Disruptive value of data, machine learning Financial, reporting Insights, data-fed features effort value
  • 11. www.scling.com Data agility 11 ● Siloed: 6+ months Cultural work ● Autonomous: 1 month Technical work ● Coordinated: days Data lake ∆ ∆ Latency?
  • 12. www.scling.com Enabling innovation 12 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 14. www.scling.com IT craft to factory 14 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 16. www.scling.com 100x 100x Data artifacts produced Manual, mechanised, industrialised 16 Spotify's pipelines ~2013
  • 17. www.scling.com Crafted artifacts: data models 17 ● Data (warehouse) models are carefully crafted ○ Built with hand-crafted SQL ○ Primitive automation ○ Reproducible? ● Require careful modelling to avoid trouble ○ E.g. slowly changing dimensions ○ Data vault, star schemas, satellites, … ● Pets, not cattle
  • 18. www.scling.com Artisanal vs industrialised data modelling Artisanal: ● Create single shared model artifact ● Used for many use cases ● Innovate fast model → use case Industrial: ● Create model for each use case ● Reuse code that produces model ● Each model may be unique ● Innovate fast raw → model → use case 18
  • 19. www.scling.com Premature modelling is waste ● Power: Recompute model quickly ● Lifted limitation: Expensive to compute model ● Old rule: Careful manual modelling work ● New rules: Guard rails preventing model iteration from breaking downstream ○ Code QA = testing ○ Code + data QA = monitoring Yes, on purpose! 19
  • 20. www.scling.com Artisanal vs industrialised knowledge graphs Artisanal: ● Create single shared graph ● Used for many use cases ● Innovate fast graph → use case Industrial: ● Create graph for each use case ● Reuse code that produces graph ● Each graph may be unique ● Innovate fast raw → graph → use case 20
  • 21. www.scling.com Artisanal vs industrialised machine learning models Google MLOps maturity model: ● MLOps level 0: Manual process ● MLOps level 1: ML pipeline automation ● MLOps level 2: CI/CD pipeline automation https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning 21
  • 22. www.scling.com Road towards industrialisation 22 Data warehouse age - mechanised analytics DW LAMP stack age - manual analytics Hadoop age - industrialised analytics, data-fed features, machine learning Significant change in workflows Early Hadoop: ● Weak indexing ● No transactions ● Weak security ● Batch transformations
  • 23. www.scling.com Simplifying use of new technology 23 DW Enterprise big data failures "Modern data stack" - traditional workflows, new technology Low-code, no-code
  • 24. www.scling.com We have seen this before 24 Difficult adoption 4GL, UML, low-code, no-code Software engineering education
  • 25. www.scling.com Data engineering in the future 25 DW ~10 year capability gap "data factory engineering" Enterprise big data failures "Modern data stack" - traditional workflows, new technology 4GL / UML phase of data engineering Data engineering education
  • 26. www.scling.com Low-code web creation works. Future of low-code & no-code 26 Low-code application development does not. Low-code data?
  • 27. www.scling.com ● Static content (mostly) ● Low complexity ● Simple QA ● Inbound data + user defines content ● High complexity ● QA depends on user + data Future of low-code & no-code 27 ● User defines content ● Medium complexity ● QA depends on user behaviour
  • 28. www.scling.com SQL for data processing ● SQL used in 3 distinct contexts ○ Interactive exploration ○ Backend data record retrieval ○ ETL data processing? 28 Important data language features: ● Can express (complex) business logic ● Composability ● Reusability ● Testability ● Seamless integration with external logic ● Tools to guide towards good path ○ Type system ○ Inspection tools ● IDE experience ● Debuggability ● Data quality measurement support ● Data quality improvement support ● Learning curve
  • 29. www.scling.com SQL for data processing ● SQL used in 3 distinct contexts ○ Interactive exploration ○ Backend data record retrieval ○ ETL data processing? 29 Important data language features: ● Can express (complex) business logic ● Composability ● Reusability ● Testability ● Seamless integration with external logic ● Tools to guide towards good path ○ Type system ○ Inspection tools ● IDE experience ● Debuggability ● Data quality measurement support ● Data quality improvement support ● Learning curve https://threadreaderapp.com/thread/1353832649664692225.html
  • 30. www.scling.com SQL inadequate for mature applications ● SQL from scratch - things seem ok ● Porting a mature application ○ Cannot reasonably express logic ○ ~5x slower (Hive 1.x) ○ Give up quality metrics ● Data quality measurements ● Data quality improvement 30 case class Order(item: ItemId, userId: UserId) case class User(id: UserId, country: String) val orders = read(orderPath) val users = read(userPath) val orderNoUserCounter = longAccumulator("order-no-user") val joined: C[(Order, Option[User])] = orders .groupBy(_.userId) .leftJoin(users.groupBy(_.id)) .values val orderWithUser: C[(Order, User)] = joined .flatMap( orderUser match case (order, Some(user)) => Some((order, user)) case (order, None) => { orderNoUserCounter.add(1) None })
  • 31. www.scling.com Technology adoption & modern data stack ● New power: Build more complex data processing flows ● Old limitation: Brain capability to understand full flow ● Rules to mitigate limitation: Declarative & low code languages ● New rules: Software engineering / DevOps 31
  • 32. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 32
  • 33. www.scling.com Data platform Big data - a collaboration paradigm 33 Stream storage? Data lake Data democratised
  • 34. www.scling.com Technology adoption & data lake collaboration ● New powers: Share data across teams with minimal operational risk Fast experiment iteration and feedback with minimal operational risk ● Old limitations: Operational risk. Governance risk. Political. ● Rules to mitigate limitation: Data isolated. Internal API = technical contract ● New rules: DataOps - holistic QA New governance mechanisms 34
  • 35. www.scling.com Data platform Data products / contracts = old rules, new context 35 Stream storage? Data lake Data contract Data product
  • 36. www.scling.com Left is up 36 Winston W Royce: "Managing the development of large software systems"
  • 40. www.scling.com Vintage team contracts and products 40 ● Rational Unified Process ● Strong separation between teams / developers ● Contracts at handoff points ● Maximum number of handoffs in a value stream
  • 44. www.scling.com Which methodologies fade or prevail? 44 ● Perpendicular to value stream ○ Barriers between people & teams ○ Extra non-value adding work ○ More handoffs ○ Homogeneous competence ● Waterfall ● RUP ● Data products / data mesh ● Data contracts ● Aligned along value stream ○ Few handoffs from raw to value ○ Enabled teams ○ Remove waste (in lean terms) ○ Heterogeneous competence ● Extreme programming / TDD ● Agile ● Big data ● DevOps ● DataOps
  • 45. www.scling.com Risk management by shifting left ● Manual governance ● Automated process ● DevOps: ○ Automated quality risk management ○ Quick feedback up in value stream ○ Left shifted QA risk management has improved both speed and quality ● DataOps: ○ Contracts are automated tests ○ Inter-system protocols are implementation details ○ New rule: Holistic QA ○ New governance 45
  • 46. www.scling.com Risk management by shifting left ● Manual governance ● Automated process ● DevOps: ○ Automated quality risk management ○ Quick feedback up in value stream ○ Left shifted QA risk management has improved both speed and quality ● DataOps: ○ Contracts are automated tests ○ Inter-system protocols are implementation details ○ New rule: Holistic QA ○ New governance 46 ● DevSecOps ○ Security team approval ○ One-off vulnerability scans ○ Automated security rule validation ○ Feedback on change in vulnerabilities ● GovernanceOps? ○ Manual approval ○ Automated governance rule validation? ● ComplianceOps? ○ Manual one-off audits ○ Automated compliance inspections?
  • 49. www.scling.com Wrapup 49 ● The future is faster ○ Patterns from other disciplines ○ How do leaders work? ○ Rules that hold us back today ● Look at software engineering evolution ○ Industrialised process eliminates big design up front ○ Enabled, high code components ○ Stream-aligned teams ○ Shift left continues
  • 50. www.scling.com Wrapup 50 ● The future is faster ○ Patterns from other disciplines ○ How do leaders work? ○ Rules that hold us back today ● Look at software engineering evolution ○ Industrialised process eliminates big design up front ○ Enabled, high code components ○ Stream-aligned teams ○ Shift left continues ● Change is difficult, takes years ○ Agile transformations ○ DevOps transformations ● Current methods ineffective ○ Organically grow competence ○ Buy stuff ○ Consultants ● Belief: new collaboration methods
  • 51. www.scling.com Scling - data-factory-as-a-service 51 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration
  • 52. www.scling.com Tech has massive impact on society 52 Product? Supplier? Employer? Make an active choice whether to have an impact! Cloud?