SlideShare a Scribd company logo
1 of 24
Brought to you by
Apache Iceberg
An Architectural Look Under the Covers
Alex Merced (@amdatalakehouse)
Developer Advocate at Dremio
Alex Merced
Developer Advocate at Dremio
■ Speaks and Writes on Data Lakehouse topics like Apache
Iceberg, Apache Arrow and Project Nessie
■ P99 Mantra: Better Performance should mean Better Cost
■ Host of the podcasts, Datanation and Web Dev 101
■ Hobbies include guitar, growing sprouts, fermenting food
and creating content
■ What’s a table format?
■ Why do we need a new one?
■ Architecture of an Iceberg table
■ What happens under the covers when I CRUD?
■ Resulting benefits of this design
Agenda
3
What is a table format?
■ A way to organize a dataset’s files to present them as a single “table”
■ A way to answer the question “what data is in this table?”
F F
F A table’s contents is all files in
that table’s directories
dir1
Old way
(Hive)
db1.table1
Hive Table Format
Pros
■ Works with basically every engine
since it’s been the de-facto standard
for so long
■ More efficient access patterns than
full-table scans for every query
■ File format agnostic
■ Atomically update a whole partition
■ Single, central answer to “what data is
in this table” for the whole ecosystem
■ A table’s contents is all files in that table’s directories
■ The old de-facto standard
Cons
■ Smaller updates are very inefficient
■ No way to change data in multiple partitions safely
■ In practice, multiple jobs modifying the same dataset
don’t do so safely
■ All of the directory listings needed for large tables take
a long time
■ Users have to know the physical layout of the table
■ Hive table statistics are often stale
F F
F
dir1
db1.table1
How Can We Resolve These Issues
‘s goals
● Table correctness/consistency
● Faster query planning and execution
● Allow users to not worry about the
physical layout of the data
● Table evolution
● Accomplish all of these at scale
db1.table1
A table is a canonical list of files
F F
F
New way
→ We need a new table format
F F
F
Old way
(Hive)
A table’s contents is all files in
that table’s directories
dir1
db1.table1
What Iceberg is and Isn’t
✅
■ Table format specification
■ A set of APIs and libraries for interaction
with that specification
○ These libraries are leveraged in other
engines and tools that allow them to
interact with Iceberg tables
❌
■ A storage engine
■ An execution engine
■ A service
Components of Iceberg
Iceberg Table Format
■ Overview of the components
■ Summary of the read path (SELECT) #
metadata layer
data layer
Iceberg Table Format
■ Overview of the components
■ Summary of the read path (SELECT) #
metadata layer
data layer
Iceberg Components: The Catalog
Iceberg Catalog
■ A store that houses the current
metadata pointer for Iceberg tables
■ Must support atomic operations for
updating the current metadata pointer
(e.g. HDFS, HMS, Nessie)
1
2
metadata layer
data layer
table1’s current metadata pointer
■ Mapping of table name to the location
of current metadata file
Iceberg Components: The Metadata File
Metadata file - stores metadata about a
table at a certain point in time
{
"table-uuid" : "<uuid>",
"location" : "/path/to/table/dir",
"schema": {...},
"partition-spec": [ {<partition-details>}, ...],
"current-snapshot-id": <snapshot-id>,
"snapshots": [ {
"snapshot-id": <snapshot-id>
"manifest-list": "/path/to/manifest/list.avro"
}, ...],
...
}
3
5
4
metadata layer
data layer
Iceberg Components: Manifest Lists
Manifest list file - a list of manifest files
{
"manifest-path" : "/path/to/manifest/file.avro",
"added-snapshot-id": <snapshot-id>,
"partition-spec-id": <partition-spec-id>,
"partitions": [ {partition-info}, ...],
...
}
{
"manifest-path" : "/path/to/manifest/file2.avro",
"added-snapshot-id": <snapshot-id>,
"partition-spec-id": <partition-spec-id>,
"partitions": [ {partition-info}, ...],
...
}
6
metadata layer
data layer
Iceberg Components: Manifests
Manifest file - a list of data files, along with
details and stats about each data file
{
"data-file": {
"file-path": "/path/to/data/file.parquet",
"file-format": "PARQUET",
"partition":
{"<part-field>":{"<data-type>":<value>}},
"record-count": <num-records>,
"null-value-counts": [{
"column-index": "1", "value": 4
}, ...],
"lower-bounds": [{
"column-index": "1", "value": "aaa"
}, ...],
"upper-bounds": [{
"column-index": "1", "value": "eee"
}, ...],
}
...
}
{
...
}
7
8
metadata layer
data layer
Examples of Iceberg at Work
Examples: Creating a Table
table1/
|- metadata/
| |- v1.metadata.json
|- data/
CREATE TABLE db1.table1 (
order_id bigint,
customer_id bigint,
order_amount DECIMAL(10, 2),
order_ts TIMESTAMP
)
USING iceberg
PARTITIONED BY ( hour(order_ts) );
db1.table1 ->
table1/metadata/v1.metadata.json
✅
Examples: Inserting Records into Table
table1/
|- metadata/
| |- v1.metadata.json
|- data/
INSERT INTO db1.table1 VALUES (
123,
456,
36.17,
‘2021-01-26 08:10:23’
);
table1/
|- metadata/
| |- v1.metadata.json
| |- v2.metadata.json
| |- snap-2938-1-4103.avro
| |- d8f9-ad19-4e.avro
|- data/
| |- order_ts_hour=2021-01-26-08/
| | |- 00000-5-cae2d.parquet
db1.table1 ->
table1/metadata/v1.metadata.json
db1.table1 ->
table1/metadata/v2.metadata.json
✅
Examples: Merge/Upsert
table1/
|- metadata/
| |- v1.metadata.json
| |- v2.metadata.json
| |- snap-2938-1-4103.avro
| |- d8f9-ad19-4e.avro
|- data/
| |- order_ts_hour=2021-01-26-08/
| | |- 00000-5-cae2d.parquet
MERGE INTO db1.table1
USING ( SELECT * FROM table1_stage ) s
ON table1.order_id = s.order_id
WHEN MATCHED THEN UPDATE table1.order_amount = s.order_amount
WHEN NOT MATCHED THEN INSERT *
table1/
|- metadata/
| |- v1.metadata.json
| |- v2.metadata.json
| |- v3.metadata.json
| |- snap-29c8-1-b103.avro
| |- snap-9fa1-3-16c3.avro
| |- d8f9-ad19-4e.avro
| |- 0d9a-98fa-77.avro
|- data/
| |- order_ts_hour=2021-01-26-08/
| | |- 00000-5-cae2d.parquet
| | |- 00000-1-aef71.parquet
| |- order_ts_hour=2021-01-27-10/
| | |- 00000-3-0fa3a.parquet
db1.table1 ->
table1/metadata/v2.metadata.json
db1.table1 ->
table1/metadata/v3.metadata.json
✅
Examples: Read a Table
db1.table1 ->
table1/metadata/v3.metadata.json
19
SELECT *
FROM db1.table1
table1/
|- metadata/
| |- v1.metadata.json
| |- v2.metadata.json
| |- v3.metadata.json
| |- snap-29c8-1-b103.avro
| |- snap-9fa1-3-16c3.avro
| |- d8f9-ad19-4e.avro
| |- 0d9a-98fa-77.avro
|- data/
| |- order_ts_hour=2021-01-26-08/
| | |- 00000-5-cae2d.parquet
| | |- 00000-1-aef71.parquet
| |- order_ts_hour=2021-01-27-10/
| | |- 00000-3-0fa3a.parquet
1
2
3
4
5
6
1
3
4
5
6
2
✅
6
6
Examples: Read a Table (Time Travel)
20
db1.table1 ->
table1/metadata/v3.metadata.json
SELECT *
FROM db1.table1 AS OF ‘2021-05-26 09:30:00’
-- (timestamp is from before MERGE INTO operation)
table1/
|- metadata/
| |- v1.metadata.json
| |- v2.metadata.json
| |- v3.metadata.json
| |- snap-29c8-1-b103.avro
| |- snap-9fa1-3-16c3.avro
| |- d8f9-ad19-4e.avro
| |- 0d9a-98fa-77.avro
|- data/
| |- order_ts_hour=2021-01-26-08/
| | |- 00000-5-cae2d.parquet
| | |- 00000-1-aef71.parquet
| |- order_ts_hour=2021-01-27-10/
| | |- 00000-3-0fa3a.parquet
1
2
3
4
5
6
1
3
4
5
6
2
✅
Case Study: Atlas (Netflix)
■ Hive table – with Parquet filters:
○ 400k+ splits per day, not combined
○ EXPLAIN query: 9.6 min (planning wall time)
■ Iceberg table – partition data filtering:
○ 15,218 splits, combined
○ 13 min (wall time) / 10 sec (planning)
■ Iceberg table – partition and min/max filtering:
○ 412 splits
○ 42 sec (wall time) / 25 sec (planning)
■ Efficiently make smaller updates
■ Snapshot isolation for transactions
■ Faster planning and execution
■ Reliable metrics for CBOs (vs hive..)
■ Time Travel
■ Table Evolution (Schema/Partitioning)
■ All Engines can see changes immediately
■ ACID Transactions
+ PROJECT NESSIE
■ Git-Like Semantics
■ Multi-Table transactions
Benefits of Iceberg
Resources
■ iceberg.apache.org
■ dremio.com/subsurface/apache-iceberg/
■ Get hands on - iceberg.apache.org/getting-started/
Brought to you by
Alex Merced
alex.merced@dremio.com
@amdatalakehouse

More Related Content

What's hot

What's hot (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 

Similar to Apache Iceberg: An Architectural Look Under the Covers

Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
Memory map
Memory mapMemory map
Memory map
aviban
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespace
Marco Tusa
 

Similar to Apache Iceberg: An Architectural Look Under the Covers (20)

OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Database_Tuning.ppt
Database_Tuning.pptDatabase_Tuning.ppt
Database_Tuning.ppt
 
Memory map
Memory mapMemory map
Memory map
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespace
 
Harry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatHarry Potter & Apache iceberg format
Harry Potter & Apache iceberg format
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
 
Dynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeDynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data Merge
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

More from ScyllaDB

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Apache Iceberg: An Architectural Look Under the Covers

  • 1. Brought to you by Apache Iceberg An Architectural Look Under the Covers Alex Merced (@amdatalakehouse) Developer Advocate at Dremio
  • 2. Alex Merced Developer Advocate at Dremio ■ Speaks and Writes on Data Lakehouse topics like Apache Iceberg, Apache Arrow and Project Nessie ■ P99 Mantra: Better Performance should mean Better Cost ■ Host of the podcasts, Datanation and Web Dev 101 ■ Hobbies include guitar, growing sprouts, fermenting food and creating content
  • 3. ■ What’s a table format? ■ Why do we need a new one? ■ Architecture of an Iceberg table ■ What happens under the covers when I CRUD? ■ Resulting benefits of this design Agenda 3
  • 4. What is a table format? ■ A way to organize a dataset’s files to present them as a single “table” ■ A way to answer the question “what data is in this table?” F F F A table’s contents is all files in that table’s directories dir1 Old way (Hive) db1.table1
  • 5. Hive Table Format Pros ■ Works with basically every engine since it’s been the de-facto standard for so long ■ More efficient access patterns than full-table scans for every query ■ File format agnostic ■ Atomically update a whole partition ■ Single, central answer to “what data is in this table” for the whole ecosystem ■ A table’s contents is all files in that table’s directories ■ The old de-facto standard Cons ■ Smaller updates are very inefficient ■ No way to change data in multiple partitions safely ■ In practice, multiple jobs modifying the same dataset don’t do so safely ■ All of the directory listings needed for large tables take a long time ■ Users have to know the physical layout of the table ■ Hive table statistics are often stale F F F dir1 db1.table1
  • 6. How Can We Resolve These Issues ‘s goals ● Table correctness/consistency ● Faster query planning and execution ● Allow users to not worry about the physical layout of the data ● Table evolution ● Accomplish all of these at scale db1.table1 A table is a canonical list of files F F F New way → We need a new table format F F F Old way (Hive) A table’s contents is all files in that table’s directories dir1 db1.table1
  • 7. What Iceberg is and Isn’t ✅ ■ Table format specification ■ A set of APIs and libraries for interaction with that specification ○ These libraries are leveraged in other engines and tools that allow them to interact with Iceberg tables ❌ ■ A storage engine ■ An execution engine ■ A service
  • 9. Iceberg Table Format ■ Overview of the components ■ Summary of the read path (SELECT) # metadata layer data layer
  • 10. Iceberg Table Format ■ Overview of the components ■ Summary of the read path (SELECT) # metadata layer data layer
  • 11. Iceberg Components: The Catalog Iceberg Catalog ■ A store that houses the current metadata pointer for Iceberg tables ■ Must support atomic operations for updating the current metadata pointer (e.g. HDFS, HMS, Nessie) 1 2 metadata layer data layer table1’s current metadata pointer ■ Mapping of table name to the location of current metadata file
  • 12. Iceberg Components: The Metadata File Metadata file - stores metadata about a table at a certain point in time { "table-uuid" : "<uuid>", "location" : "/path/to/table/dir", "schema": {...}, "partition-spec": [ {<partition-details>}, ...], "current-snapshot-id": <snapshot-id>, "snapshots": [ { "snapshot-id": <snapshot-id> "manifest-list": "/path/to/manifest/list.avro" }, ...], ... } 3 5 4 metadata layer data layer
  • 13. Iceberg Components: Manifest Lists Manifest list file - a list of manifest files { "manifest-path" : "/path/to/manifest/file.avro", "added-snapshot-id": <snapshot-id>, "partition-spec-id": <partition-spec-id>, "partitions": [ {partition-info}, ...], ... } { "manifest-path" : "/path/to/manifest/file2.avro", "added-snapshot-id": <snapshot-id>, "partition-spec-id": <partition-spec-id>, "partitions": [ {partition-info}, ...], ... } 6 metadata layer data layer
  • 14. Iceberg Components: Manifests Manifest file - a list of data files, along with details and stats about each data file { "data-file": { "file-path": "/path/to/data/file.parquet", "file-format": "PARQUET", "partition": {"<part-field>":{"<data-type>":<value>}}, "record-count": <num-records>, "null-value-counts": [{ "column-index": "1", "value": 4 }, ...], "lower-bounds": [{ "column-index": "1", "value": "aaa" }, ...], "upper-bounds": [{ "column-index": "1", "value": "eee" }, ...], } ... } { ... } 7 8 metadata layer data layer
  • 16. Examples: Creating a Table table1/ |- metadata/ | |- v1.metadata.json |- data/ CREATE TABLE db1.table1 ( order_id bigint, customer_id bigint, order_amount DECIMAL(10, 2), order_ts TIMESTAMP ) USING iceberg PARTITIONED BY ( hour(order_ts) ); db1.table1 -> table1/metadata/v1.metadata.json ✅
  • 17. Examples: Inserting Records into Table table1/ |- metadata/ | |- v1.metadata.json |- data/ INSERT INTO db1.table1 VALUES ( 123, 456, 36.17, ‘2021-01-26 08:10:23’ ); table1/ |- metadata/ | |- v1.metadata.json | |- v2.metadata.json | |- snap-2938-1-4103.avro | |- d8f9-ad19-4e.avro |- data/ | |- order_ts_hour=2021-01-26-08/ | | |- 00000-5-cae2d.parquet db1.table1 -> table1/metadata/v1.metadata.json db1.table1 -> table1/metadata/v2.metadata.json ✅
  • 18. Examples: Merge/Upsert table1/ |- metadata/ | |- v1.metadata.json | |- v2.metadata.json | |- snap-2938-1-4103.avro | |- d8f9-ad19-4e.avro |- data/ | |- order_ts_hour=2021-01-26-08/ | | |- 00000-5-cae2d.parquet MERGE INTO db1.table1 USING ( SELECT * FROM table1_stage ) s ON table1.order_id = s.order_id WHEN MATCHED THEN UPDATE table1.order_amount = s.order_amount WHEN NOT MATCHED THEN INSERT * table1/ |- metadata/ | |- v1.metadata.json | |- v2.metadata.json | |- v3.metadata.json | |- snap-29c8-1-b103.avro | |- snap-9fa1-3-16c3.avro | |- d8f9-ad19-4e.avro | |- 0d9a-98fa-77.avro |- data/ | |- order_ts_hour=2021-01-26-08/ | | |- 00000-5-cae2d.parquet | | |- 00000-1-aef71.parquet | |- order_ts_hour=2021-01-27-10/ | | |- 00000-3-0fa3a.parquet db1.table1 -> table1/metadata/v2.metadata.json db1.table1 -> table1/metadata/v3.metadata.json ✅
  • 19. Examples: Read a Table db1.table1 -> table1/metadata/v3.metadata.json 19 SELECT * FROM db1.table1 table1/ |- metadata/ | |- v1.metadata.json | |- v2.metadata.json | |- v3.metadata.json | |- snap-29c8-1-b103.avro | |- snap-9fa1-3-16c3.avro | |- d8f9-ad19-4e.avro | |- 0d9a-98fa-77.avro |- data/ | |- order_ts_hour=2021-01-26-08/ | | |- 00000-5-cae2d.parquet | | |- 00000-1-aef71.parquet | |- order_ts_hour=2021-01-27-10/ | | |- 00000-3-0fa3a.parquet 1 2 3 4 5 6 1 3 4 5 6 2 ✅ 6 6
  • 20. Examples: Read a Table (Time Travel) 20 db1.table1 -> table1/metadata/v3.metadata.json SELECT * FROM db1.table1 AS OF ‘2021-05-26 09:30:00’ -- (timestamp is from before MERGE INTO operation) table1/ |- metadata/ | |- v1.metadata.json | |- v2.metadata.json | |- v3.metadata.json | |- snap-29c8-1-b103.avro | |- snap-9fa1-3-16c3.avro | |- d8f9-ad19-4e.avro | |- 0d9a-98fa-77.avro |- data/ | |- order_ts_hour=2021-01-26-08/ | | |- 00000-5-cae2d.parquet | | |- 00000-1-aef71.parquet | |- order_ts_hour=2021-01-27-10/ | | |- 00000-3-0fa3a.parquet 1 2 3 4 5 6 1 3 4 5 6 2 ✅
  • 21. Case Study: Atlas (Netflix) ■ Hive table – with Parquet filters: ○ 400k+ splits per day, not combined ○ EXPLAIN query: 9.6 min (planning wall time) ■ Iceberg table – partition data filtering: ○ 15,218 splits, combined ○ 13 min (wall time) / 10 sec (planning) ■ Iceberg table – partition and min/max filtering: ○ 412 splits ○ 42 sec (wall time) / 25 sec (planning)
  • 22. ■ Efficiently make smaller updates ■ Snapshot isolation for transactions ■ Faster planning and execution ■ Reliable metrics for CBOs (vs hive..) ■ Time Travel ■ Table Evolution (Schema/Partitioning) ■ All Engines can see changes immediately ■ ACID Transactions + PROJECT NESSIE ■ Git-Like Semantics ■ Multi-Table transactions Benefits of Iceberg
  • 23. Resources ■ iceberg.apache.org ■ dremio.com/subsurface/apache-iceberg/ ■ Get hands on - iceberg.apache.org/getting-started/
  • 24. Brought to you by Alex Merced alex.merced@dremio.com @amdatalakehouse