SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Radical Speed for SQL
Queries on Databricks:
Photon Under the Hood
Alex Behm
Tech Lead, Databricks
Greg Rahn
Staff Product Manager, Databricks
Agenda
▪ Intro to Photon
▪ Recent Developments
▪ Up Next
▪ Summary
Introduction to Photon
Observed Workload Trends
Businesses are moving faster, and as a result
organizations spend less time in data modeling, leading
to worse performance.
▪ Most columns don’t have "NOT NULL" constraints defined
▪ Strings are convenient but slower than specific types
▪ Data lifecycle: Raw → Bronze → Silver → Gold
Can we get both agility and performance?
-- Data [Analysts | Engineers | Scientists] everywhere
Just one more ask:
SQL as a first-class citizen on
Databricks
What is Photon?
Photon is a new 100% Apache Spark compatible query engine
designed for speed and flexibility.
It’s built from the ground up to deliver the fastest performance
on modern cloud hardware for all data use cases across
data engineering, data science, machine learning, and data analytics.
• Re-architected for the fastest performance on real-world
applications
• Native C++ engine for faster queries
• Custom built memory management to avoid JVM bottlenecks
• Vectorized: memory, instruction, and data parallelism (SIMD)
• Works with your existing code and avoids vendor lock-in
• 100% compatible with open source Spark DataFrame APIs and Spark SQL
• Transparent operation to users - no need to invoke something new, it just works
• Optimizing for all data use cases and workloads
• Today, supporting SQL and DataFrame workloads
• Coming soon, Streaming, Data Science, and more
Building the next generation query engine
Why build a new execution engine?
● Parsing
● Catalyst: Analysis/Planning/Optimization
● Scheduling
Execute Task
Client: Submit SQL Query
Execute Task Execute Task Execute Task Spark Executors
Mixed
JVM/Native
Spark Driver
JVM
Photon in the Databricks Lakehouse Platform
Delta Lake
1
0
1
0
1
0
1
0
1
0
1
0
• Hybrid Photon/Spark Plans
• Use Photon when possible, fall back to Spark for unsupported operations
• Completely transparent to users
• Native code using off-heap memory
• Natural access to memory and intrinsics (no fiddling with Java Unsafe)
• No JVM GC, large heaps ok
• No JVM JIT performance cliffs / limitations
• Fully integrated with Spark’s memory manager
• Prefers hash join over sort-merge join
• Rich per-operator performance metrics
Key Photon Characteristics
Recent Developments in Photon
Development Focus Areas
1. Production Readiness
a. Goal: Resilience comparable to DBR → spilling support
b. Testing and hardening, real customer workloads
2. Query Coverage
a. Today: Basics like joins/aggregations/shuffle, common types and functions
b. In development: Nested types, built-in functions
c. Coming soon: Sort/Window
3. Performance
a. Analyze and optimize common usage patterns
Disclaimer: Microbenchmarks
Microbenchmarks do not necessarily reflect
real-world end-to-end performance
During Photon development we analyze and optimize
performance with extensive microbenchmarks
In the following slides, we share benchmark results that
were run in controlled and narrowly scoped scenarios
Resilience with Very Large Inputs
• Spilling for very large inputs
• Write intermediate state to external storage to process
inputs exceeding available memory
✅ Hash Shuffle
✅ Hash Aggregation
✅ Hash Join
2-5x Speedup
Example: Spilling Hash Join [1 of 4]
Partitioned Hash Table
• Hash join has two phases
• build and probe
• Build phase: insert records
from one join input into the
hash table
• Hash table has a fixed
number of partitions
Example: Spilling Hash Join [2 of 4]
• When memory runs out spill
one partition to disk
• New records go to
in-memory partitions or
straight to disk
• Repeat until build is done
Partitioned Hash Table
Example: Spilling Hash Join [3 of 4]
• Probe phase: process
rows from other join input
• Emit results for probe
rows matching in-memory
build partitions
• Spill probe rows matching
a spilled build partition
Partitioned Hash Table
Build
Probe
Example: Spilling Hash Join [4 of 4]
• For each spilled partition,
repeat the same
build/probe process
• Might spill again! Apply
same algorithm recursively
Build
Probe
⨝
Spilling Hash Join vs. Spilling Sort-Merge Join
• Photon converts Sort-Merge Joins to Hash Joins
• Sort Merge Join
• Buffer + sort both join inputs, increasing memory pressure
• Spilling sort → write entire input to sorted runs
• Hash Join
• Only buffer build input (typically the smaller input) in a hash table
• Graceful degradation: Spill both inputs at the build-partition granularity
• Role reversal: Swap build/probe when processing spilled partitions
Up to 5x Speedup
Hardening: How we test Photon
• Random queries and data
• Using new open-source Spark random query generator
• Failure injection
• Randomly trip error paths to ensure graceful query failure
• Spill injection
• Randomly trigger spill events to simulate memory pressure
• Clang/LLVM C++ tools
• Address Sanitizer
• Undefined Behavior Sanitizer
• Combinations of the above
🐛
🔨
Query Coverage
Overview of Query Coverage
Data Types Operators
✅ Byte/Short/Int/Long
✅ Boolean
✅ String/Binary
✅ Decimal
✅ Float/Double
✅ Date/Timestamp
✅ Struct
Coming soon: Array, Map
✅ Scan, Filter, Project
✅ Hash Aggregate/Join/Shuffle
✅ Nested-Loop Join
✅ Null-Aware Anti Join
✅ Union, Expand, ScalarSubquery
Coming soon: Sort, Window
Expressions
✅ Comparison / Logic
✅ Arithmetic / Math (most)
✅ Conditional (IF, CASE, etc.)
✅ String (common ones)
✅ Casts
✅ Aggregates (most common
ones)
✅ Date/Timestamp (in progress)
Coming soon: UDFs, long tail
Expression Coverage for DATE/TIMESTAMP
• Many queries contain date/timestamp logic
• As of today: 95% coverage (100% very soon)
• Fast path for UTC timezone (default)
• Some expressions are very complicated to implement
• Individual functions run in Spark, but still run the operator/plan in Photon
Microbenchmarks do not necessarily reflect speedups on end-to-end queries, functions optimized for UTC timezone, your mileage may vary
Nested/Complex Type Support
• ✅ Struct
• Array / Map, in active development
• Reading data and basic usage/functions work
• In progress: collect_list() / collect_set()
• Long tail of array expressions
Microbenchmarks do not necessarily reflect speedups on end-to-end queries, your mileage may vary
• Currently supports all scalar types and Struct
• Array/Map in active development
• Can be turned on/off independently of Photon
• spark.databricks.photon.parquetWriter.enabled = true
• Typical speedups: 2-4x
• Wider (>100 columns) tables can see even more gains
Writing Delta/Parquet Data
DML Support [DELETE / UPDATE / MERGE]
• Bulk of work like joins/aggregations run in Photon
• Benefits from Photon Delta/Parquet writing capability
• Typical speedups: 2-3x
ANSI SQL Support
• Development in tandem with open-source Spark
• Fail queries on overflow or similar errors
Photon: What's Next
Current/Up Next Efforts in Photon
• Finishing nested type support, including writes
• Outstanding ANSI SQL behaviors
• Sort and Window operators
• Support for bucketed tables
How to use Photon today
● Enable Photon via Workspace cluster
● Notebook or JAR
● Available on: AWS
● Not supported yet
○ UDFs
○ Streaming
● Photon via Databricks SQL
● Redash
● Tableau
● Microsoft Power BI
● BYO Tool via ODBC / JDBC
● Available on: AWS, Azure
● Not supported yet
○ Sort
○ Window
SQL Data Engineering / ELT / ETL
Interactive SQL Analytics
J
u
n
e
Photon: Key Use Cases for Preview
J
u
n
e
SELECT
vendor_id,
SUM(trip_distance) as SumTripDistance,
AVG(trip_distance) as AvgTripDistance
FROM abehm.nyc_yellow
WHERE passenger_count IN (1, 2, 4)
GROUP BY vendor_id
ORDER BY vendor_id
Sort
+- Exchange rangepartitioning
+- HashAggregate
+- Exchange hashpartitioning
+- HashAggregate
+- Project
+- Filter
+- ColumnarToRow
+- FileScan
Sort
+- Exchange
+- ColumnarToRow
+- PhotonResultStage
+- PhotonGroupingAgg
+- PhotonShuffleExchangeSource
+- PhotonShuffleMapStage
+- PhotonShuffleExchangeSink
+- PhotonGroupingAgg
+- PhotonProject
+- PhotonFilter
+- PhotonAdapter
+- FileScan
Spark UI
● Yellow → Photon Nodes
● Blue → Spark Nodes
Metrics
● Photon nodes have rich metrics to help
understand behavior and performance
● Easier than Spark where several nodes
are squashed together
1
2
3
4
Performance observations
Customer Feedback
Test Date
Average Query
Response time
(seconds)
Reduction
from
previous
June '20
DBR v6.6
7.8
December
'20
Photon
6.2 21%
May '21
Photon
4.4 29%
44% reduction
2.5x
3.7x
Avg query speedup
Power Test speedup
DEMO
"Demo" - just a walkthrough showing where users
can turn on Photon in Databricks?
Note: From getting started to executing existing
code/queries and monitoring Photon (Spark UI +
Query execution on SQLA)
Logo slide with generalized perf observations
brought down merge latency by 2-3x
Summary
Related Talks
WEDNESDAY
• 03:50 PM (PT): Databricks SQL Analytics Deep Dive for the Data Analyst - Doug Bateman, Databricks
• 04:25 PM (PT): Radical Speed for SQL Queries on Databricks: Photon Under the Hood - Greg Rahn & Alex Behm,
Databricks
• 04:25 PM (PT): Delivering Insights from 20M+ Smart Homes with 500M+ devices - Sameer Vaidya, Plume
THURSDAY
• 11:00 AM (PT): Getting Started with Databricks SQL Analytics - Simon Whiteley, Advancing Analytics
• 03:15 PM (PT): Building Lakehouses on Delta Lake and SQL Analytics - A Primer - Franco Patano, Databricks
FRIDAY
• 10:30 AM (PT): SQL Analytics Powering Telemetry Analysis at Comcast - Suraj Nesamani, Comcast
& Molly Nagamuthu, Databricks
How to get started
In June
databricks.com/try
SQL> SELECT questions FROM audience;
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Data Types Operators
✅ Byte/Short/Int/Long
✅ Boolean
✅ String/Binary
✅ Decimal
✅ Float/Double
✅ Date/Timestamp
✅ Struct
Coming soon: Array, Map
✅ Scan, Filter, Project
✅ Hash Aggregate/Join/Shuffle
✅ Nested-Loop Join
✅ Null-Aware Anti Join
✅ Union, Expand, ScalarSubquery
Coming soon: Sort, Window
Expressions
✅ Comparison / Logic
✅ Arithmetic / Math (most)
✅ Conditional (IF, CASE, etc.)
✅ String (common ones)
✅ Casts
✅ Aggregates (most common
ones)
✅ Date/Timestamp (in progress)
Coming soon: UDFs, long tail
● Parsing
● Catalyst: Analysis/Planning/Optimization
● Scheduling
Execute Task
Client: Submit SQL Query
Execute Task Execute Task Execute Task Spark Executors
Mixed
JVM/Native
Spark Driver
JVM
Delta Lake
1
0
1
0
1
0
1
0
1
0
1
0

Mais conteúdo relacionado

Mais procurados

Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfMichael Kogan
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedDatabricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 

Mais procurados (20)

Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 

Semelhante a Radical Speed for SQL Queries on Databricks: Photon Under the Hood

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformYao Yao
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_dbhyeongchae lee
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼NAVER D2
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to OneSerg Masyutin
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesDatabricks
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesRomi Kuntsman
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner
 

Semelhante a Radical Speed for SQL Queries on Databricks: Photon Under the Hood (20)

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial Pipelines
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Radical Speed for SQL Queries on Databricks: Photon Under the Hood

  • 1. Radical Speed for SQL Queries on Databricks: Photon Under the Hood Alex Behm Tech Lead, Databricks Greg Rahn Staff Product Manager, Databricks
  • 2. Agenda ▪ Intro to Photon ▪ Recent Developments ▪ Up Next ▪ Summary
  • 4. Observed Workload Trends Businesses are moving faster, and as a result organizations spend less time in data modeling, leading to worse performance. ▪ Most columns don’t have "NOT NULL" constraints defined ▪ Strings are convenient but slower than specific types ▪ Data lifecycle: Raw → Bronze → Silver → Gold Can we get both agility and performance?
  • 5. -- Data [Analysts | Engineers | Scientists] everywhere Just one more ask: SQL as a first-class citizen on Databricks
  • 6. What is Photon? Photon is a new 100% Apache Spark compatible query engine designed for speed and flexibility. It’s built from the ground up to deliver the fastest performance on modern cloud hardware for all data use cases across data engineering, data science, machine learning, and data analytics.
  • 7. • Re-architected for the fastest performance on real-world applications • Native C++ engine for faster queries • Custom built memory management to avoid JVM bottlenecks • Vectorized: memory, instruction, and data parallelism (SIMD) • Works with your existing code and avoids vendor lock-in • 100% compatible with open source Spark DataFrame APIs and Spark SQL • Transparent operation to users - no need to invoke something new, it just works • Optimizing for all data use cases and workloads • Today, supporting SQL and DataFrame workloads • Coming soon, Streaming, Data Science, and more Building the next generation query engine
  • 8. Why build a new execution engine?
  • 9. ● Parsing ● Catalyst: Analysis/Planning/Optimization ● Scheduling Execute Task Client: Submit SQL Query Execute Task Execute Task Execute Task Spark Executors Mixed JVM/Native Spark Driver JVM Photon in the Databricks Lakehouse Platform Delta Lake 1 0 1 0 1 0 1 0 1 0 1 0
  • 10. • Hybrid Photon/Spark Plans • Use Photon when possible, fall back to Spark for unsupported operations • Completely transparent to users • Native code using off-heap memory • Natural access to memory and intrinsics (no fiddling with Java Unsafe) • No JVM GC, large heaps ok • No JVM JIT performance cliffs / limitations • Fully integrated with Spark’s memory manager • Prefers hash join over sort-merge join • Rich per-operator performance metrics Key Photon Characteristics
  • 12. Development Focus Areas 1. Production Readiness a. Goal: Resilience comparable to DBR → spilling support b. Testing and hardening, real customer workloads 2. Query Coverage a. Today: Basics like joins/aggregations/shuffle, common types and functions b. In development: Nested types, built-in functions c. Coming soon: Sort/Window 3. Performance a. Analyze and optimize common usage patterns
  • 13. Disclaimer: Microbenchmarks Microbenchmarks do not necessarily reflect real-world end-to-end performance During Photon development we analyze and optimize performance with extensive microbenchmarks In the following slides, we share benchmark results that were run in controlled and narrowly scoped scenarios
  • 14. Resilience with Very Large Inputs • Spilling for very large inputs • Write intermediate state to external storage to process inputs exceeding available memory ✅ Hash Shuffle ✅ Hash Aggregation ✅ Hash Join 2-5x Speedup
  • 15. Example: Spilling Hash Join [1 of 4] Partitioned Hash Table • Hash join has two phases • build and probe • Build phase: insert records from one join input into the hash table • Hash table has a fixed number of partitions
  • 16. Example: Spilling Hash Join [2 of 4] • When memory runs out spill one partition to disk • New records go to in-memory partitions or straight to disk • Repeat until build is done Partitioned Hash Table
  • 17. Example: Spilling Hash Join [3 of 4] • Probe phase: process rows from other join input • Emit results for probe rows matching in-memory build partitions • Spill probe rows matching a spilled build partition Partitioned Hash Table Build Probe
  • 18. Example: Spilling Hash Join [4 of 4] • For each spilled partition, repeat the same build/probe process • Might spill again! Apply same algorithm recursively Build Probe ⨝
  • 19. Spilling Hash Join vs. Spilling Sort-Merge Join • Photon converts Sort-Merge Joins to Hash Joins • Sort Merge Join • Buffer + sort both join inputs, increasing memory pressure • Spilling sort → write entire input to sorted runs • Hash Join • Only buffer build input (typically the smaller input) in a hash table • Graceful degradation: Spill both inputs at the build-partition granularity • Role reversal: Swap build/probe when processing spilled partitions Up to 5x Speedup
  • 20. Hardening: How we test Photon • Random queries and data • Using new open-source Spark random query generator • Failure injection • Randomly trip error paths to ensure graceful query failure • Spill injection • Randomly trigger spill events to simulate memory pressure • Clang/LLVM C++ tools • Address Sanitizer • Undefined Behavior Sanitizer • Combinations of the above 🐛 🔨
  • 22. Overview of Query Coverage Data Types Operators ✅ Byte/Short/Int/Long ✅ Boolean ✅ String/Binary ✅ Decimal ✅ Float/Double ✅ Date/Timestamp ✅ Struct Coming soon: Array, Map ✅ Scan, Filter, Project ✅ Hash Aggregate/Join/Shuffle ✅ Nested-Loop Join ✅ Null-Aware Anti Join ✅ Union, Expand, ScalarSubquery Coming soon: Sort, Window Expressions ✅ Comparison / Logic ✅ Arithmetic / Math (most) ✅ Conditional (IF, CASE, etc.) ✅ String (common ones) ✅ Casts ✅ Aggregates (most common ones) ✅ Date/Timestamp (in progress) Coming soon: UDFs, long tail
  • 23. Expression Coverage for DATE/TIMESTAMP • Many queries contain date/timestamp logic • As of today: 95% coverage (100% very soon) • Fast path for UTC timezone (default) • Some expressions are very complicated to implement • Individual functions run in Spark, but still run the operator/plan in Photon
  • 24. Microbenchmarks do not necessarily reflect speedups on end-to-end queries, functions optimized for UTC timezone, your mileage may vary
  • 25. Nested/Complex Type Support • ✅ Struct • Array / Map, in active development • Reading data and basic usage/functions work • In progress: collect_list() / collect_set() • Long tail of array expressions
  • 26. Microbenchmarks do not necessarily reflect speedups on end-to-end queries, your mileage may vary
  • 27. • Currently supports all scalar types and Struct • Array/Map in active development • Can be turned on/off independently of Photon • spark.databricks.photon.parquetWriter.enabled = true • Typical speedups: 2-4x • Wider (>100 columns) tables can see even more gains Writing Delta/Parquet Data
  • 28. DML Support [DELETE / UPDATE / MERGE] • Bulk of work like joins/aggregations run in Photon • Benefits from Photon Delta/Parquet writing capability • Typical speedups: 2-3x ANSI SQL Support • Development in tandem with open-source Spark • Fail queries on overflow or similar errors
  • 30. Current/Up Next Efforts in Photon • Finishing nested type support, including writes • Outstanding ANSI SQL behaviors • Sort and Window operators • Support for bucketed tables
  • 31. How to use Photon today
  • 32. ● Enable Photon via Workspace cluster ● Notebook or JAR ● Available on: AWS ● Not supported yet ○ UDFs ○ Streaming ● Photon via Databricks SQL ● Redash ● Tableau ● Microsoft Power BI ● BYO Tool via ODBC / JDBC ● Available on: AWS, Azure ● Not supported yet ○ Sort ○ Window SQL Data Engineering / ELT / ETL Interactive SQL Analytics J u n e Photon: Key Use Cases for Preview J u n e
  • 33.
  • 34. SELECT vendor_id, SUM(trip_distance) as SumTripDistance, AVG(trip_distance) as AvgTripDistance FROM abehm.nyc_yellow WHERE passenger_count IN (1, 2, 4) GROUP BY vendor_id ORDER BY vendor_id Sort +- Exchange rangepartitioning +- HashAggregate +- Exchange hashpartitioning +- HashAggregate +- Project +- Filter +- ColumnarToRow +- FileScan Sort +- Exchange +- ColumnarToRow +- PhotonResultStage +- PhotonGroupingAgg +- PhotonShuffleExchangeSource +- PhotonShuffleMapStage +- PhotonShuffleExchangeSink +- PhotonGroupingAgg +- PhotonProject +- PhotonFilter +- PhotonAdapter +- FileScan
  • 35. Spark UI ● Yellow → Photon Nodes ● Blue → Spark Nodes Metrics ● Photon nodes have rich metrics to help understand behavior and performance ● Easier than Spark where several nodes are squashed together
  • 38. Customer Feedback Test Date Average Query Response time (seconds) Reduction from previous June '20 DBR v6.6 7.8 December '20 Photon 6.2 21% May '21 Photon 4.4 29% 44% reduction
  • 40. DEMO "Demo" - just a walkthrough showing where users can turn on Photon in Databricks? Note: From getting started to executing existing code/queries and monitoring Photon (Spark UI + Query execution on SQLA)
  • 41. Logo slide with generalized perf observations brought down merge latency by 2-3x
  • 43. Related Talks WEDNESDAY • 03:50 PM (PT): Databricks SQL Analytics Deep Dive for the Data Analyst - Doug Bateman, Databricks • 04:25 PM (PT): Radical Speed for SQL Queries on Databricks: Photon Under the Hood - Greg Rahn & Alex Behm, Databricks • 04:25 PM (PT): Delivering Insights from 20M+ Smart Homes with 500M+ devices - Sameer Vaidya, Plume THURSDAY • 11:00 AM (PT): Getting Started with Databricks SQL Analytics - Simon Whiteley, Advancing Analytics • 03:15 PM (PT): Building Lakehouses on Delta Lake and SQL Analytics - A Primer - Franco Patano, Databricks FRIDAY • 10:30 AM (PT): SQL Analytics Powering Telemetry Analysis at Comcast - Suraj Nesamani, Comcast & Molly Nagamuthu, Databricks
  • 44. How to get started In June databricks.com/try
  • 45. SQL> SELECT questions FROM audience;
  • 46. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 47. Data Types Operators ✅ Byte/Short/Int/Long ✅ Boolean ✅ String/Binary ✅ Decimal ✅ Float/Double ✅ Date/Timestamp ✅ Struct Coming soon: Array, Map ✅ Scan, Filter, Project ✅ Hash Aggregate/Join/Shuffle ✅ Nested-Loop Join ✅ Null-Aware Anti Join ✅ Union, Expand, ScalarSubquery Coming soon: Sort, Window Expressions ✅ Comparison / Logic ✅ Arithmetic / Math (most) ✅ Conditional (IF, CASE, etc.) ✅ String (common ones) ✅ Casts ✅ Aggregates (most common ones) ✅ Date/Timestamp (in progress) Coming soon: UDFs, long tail
  • 48. ● Parsing ● Catalyst: Analysis/Planning/Optimization ● Scheduling Execute Task Client: Submit SQL Query Execute Task Execute Task Execute Task Spark Executors Mixed JVM/Native Spark Driver JVM Delta Lake 1 0 1 0 1 0 1 0 1 0 1 0