SlideShare uma empresa Scribd logo
1 de 50
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Steffen Grunwald, AWS Solutions Architect, @steffeng
AWS Pop-up Loft Berlin, 17. October 2018
Query your data in S3 with SQL
and optimize for cost and
performance
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What you will learn from this Session
• Benefits of raw Data in Amazon Simple Storage Service
• Query on S3 with Amazon Athena
• Optimize your Data Structure
• Compression
• Partitioning
• Columnar Formats
• Derive Views from raw Data for frequent Queries
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Application: New York Taxi Data Ingestion
Amazon Kinesis
Streams
Amazon Kinesis
Analytics
Amazon Kinesis
Streams
AWS
Lambda
Amazon
CloudWatch
Amazon Kinesis
Firehose
Amazon
QuickSight
AWS Glue
Amazon
S3
Amazon
Athena
Instance
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of raw Data in
Amazon Simple Storage Service (S3)
• Highly durable and cost-effective object store
• Limitlessly scalable
• Pay for what you use - in GB per month
• Decouple storage from compute
• Widely supported API by many consumers
• Well integrated into other AWS services
Use S3 as long term storage to answer yet unknown
questions of tomorrow.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest Data with Amazon Kinesis Firehose
• Stores stream of records as files in a bucket
• Path: <Optional Prefix> + "YYYY/MM/DD/HH“
(Ingestion Time, UTC)
• Optionally compress (GZIP, ZIP, Snappy)
• Optionally store as columnar format (ORC, Parquet)
• Optionally transform records with AWS Lambda
Amazon Kinesis Firehose Amazon S3 Bucket
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena is an interactive query service that
makes it easy to analyze data directly from Amazon
S3 using Standard SQL
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• No Extract, Transform, and Load (ETL) required
• Stream data directly from Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Presto SQL
• ANSI SQL compliant
• Complex joins, nested queries &
window functions
• Complex data types (arrays,
structs, maps)
• Partitioning of data by any key
• date, time, custom keys
• Presto built-in functions
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena Supports Multiple Data Formats
• Text files, e.g., CSV, raw logs
• Apache Web Logs, TSV files
• JSON (simple, nested)
• Compressed files
• Columnar formats such as Parquet & ORC
• AVRO support
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: Query files from Amazon Kinesis Firehose
with Amazon Athena and AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Example Data
• NYC Taxi & Limousine Commission rides
• Data is generated by kinesis-taxi-stream-
producer available at [1]:
java -jar kinesis-taxi-stream-producer.jar
-speedup 400 -statisticsFrequency 10000
-stream nyctlc-ingestion –noWatermark
-region eu-central-1 -adaptTime ingestion
• ~2GB/h of raw data, 11 days, 487 GB total
[1] https://github.com/aws-samples/flink-stream-
processing-refarch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Test Setup: Ingesting Data with different Settings
Amazon
Kinesis
Streams
Amazon S3
Instance
Firehose (gzip)
Firehose (raw)
Firehose (orc)
Firehose (parquet)
(max Amazon Kinesis Firehose
buffering hints: 128MB & 900s)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Photo by Glen Noble on Unsplash
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query I
Show some rides on 2nd September 10-11h:
SELECT *
FROM "128mb"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
LIMIT 10
Run time: 3.53 seconds, Data scanned: 4.62GB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query II (gzip)
Show some rides on 2nd September 10-11h:
SELECT *
FROM "128mbgz"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
LIMIT 10
Run time: 3.53 seconds, Data scanned: 4.62GB
Run time: 2.45 seconds, Data scanned: 303.04KB
gzip reduces 487GB to 76GB.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query III (without LIMIT 10)
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, count(*) count
FROM "128mbgz"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
GROUP BY passenger_count
Run time: 50.36 seconds, Data scanned: 76.5GB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Photo by Tang Junwen on Unsplash
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Partitions to the Rescue
AWS Glue crawler adds partitions based on file prefixes/ dirs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query IV
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, count(*) count
FROM "128mbgz"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
AND partition_0 || partition_1 || partition_2 ||
partition_3
BETWEEN '2018090210' AND '2018090215'
GROUP BY passenger_count
Run time: 27.6 seconds, Data scanned: 25.5GB
Run time: 5.59 seconds, Data scanned: 1.77GB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log
S3 Athena
Data Catalog
Schema
Lookup
Create table partitions
Glue
Crawl Partitions with AWS Glue
Query data
Why? Just schedule the crawler, no need to code!
Deals with schema evolution.
Crawl data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use Hive-style File Format in S3
Move/ copy:
YYYY/MM/DD/HH/file
year=YYYY/month=MM/day=DD/hours=HH/file
Make Athena reload partitions by: msck repair table
Why? Format easy to create on write, easy to move.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log
S3 Athena
Data Catalog
Schema
Lookup
Add table partition
Lambda
Creating Partitions with AWS Lambda
Query data
New File
Trigger
Why? Add partitions instantly, just AWS Lambda cost.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Populate Partitions if paths are known
Issue Statements with Amazon Athena:
ALTER TABLE mytable
ADD PARTITION
(year='2015',month='01',day='01')
LOCATION 's3://[...]/2015/01/01/'
Why? Easy for predictable paths. Can be prepopulated.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Columnar Formats
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Last_Name
Label
Le Fleming
Lisciandro
Minghi
Jime
Age
34
25
45
63
22
Gender
Fem
Fem
Fem
Mal
Mal
Flat File Sample Layout
First_Name
Tootsie
Miriam
Blakeley
Ernst
Brew
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Last_Name
Label
Le Fleming
Lisciandro
Minghi
Jime
MIN: Jime
MAX: Minghi
Age
34
25
45
63
22
MIN: 22
MAX: 63
Gender
Fem
Fem
Fem
Mal
Mal
MIN: Fem
MAX: Mal
First_Name
Tootsie
Miriam
Blakeley
Ernst
Brew
MIN: Blakeley
MAX: Tootsie
Columnar Formats Layout (Parquet & ORC)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Last_Name
Label
Le Fleming
Lisciandro
Minghi
Jime
MIN: Jime
MAX: Minghi
Age
34
25
45
63
22
MIN: 22
MAX: 63
Gender
Fem
Fem
Fem
Mal
Mal
MIN: Fem
MAX: Mal
First_Name
Tootsie
Miriam
Blakeley
Ernst
Brew
MIN: Blakeley
MAX: Tootsie
Benefit 1: Predicate Pushdown
SELECT * FROM ... WHERE Age > 30
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Last_Name
Label
Le Fleming
Lisciandro
Minghi
Jime
MIN: Jime
MAX: Minghi
Age
34
25
45
63
22
MIN: 22
MAX: 63
Gender
Fem
Fem
Fem
Mal
Mal
MIN: Fem
MAX: Mal
First_Name
Tootsie
Miriam
Blakeley
Ernst
Brew
MIN: Blakeley
MAX: Tootsie
Benefit 2: Projection Pushdown/ Column Pruning
SELECT First_Name FROM ... WHERE Age > 30
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefit 3: Compression & Encoding
• RLE (& Bit Packing) for numbers
• Dictionary for string repetitions (+RLE)
• Delta encoding for increasing numbers
• Delta Strings (for string with a identical prefix)
• Plain encoding for varied strings
https://github.com/apache/parquet-format/blob/master/Encodings.md
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More on Dictionary Encoding
• Builds list of unique strings, assigns numeric ID to each
• If the dictionary size over 1MB (configurable) or
number of distinct values too high, will fall back to
Plain encoding.
• The data itself is later represented as numbers and is
further encoded using RLE
https://github.com/apache/parquet-format/blob/master/Encodings.md
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: Parquet/ ORC with Amazon
Kinesis Firehose (new!)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query V (parquet)
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, count(*) count
FROM "128mbparquet"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
AND partition_0 || partition_1 || partition_2 ||
partition_3
BETWEEN '2018090210' AND '2018090215'
GROUP BY passenger_count
Run time: 5.59 seconds, Data scanned: 1.77GB
Run time: 3.21 seconds, Data scanned: 300.7MB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyzing Parquet File
• parquet-tools
• head – view data in file
• meta – get metadata summary
• dump -d -n – get detailed metadata down to page
level stats included
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Schema Information
Row Count Total Byte Size Size in Bytes Value Count Encoding
Download and build [1].
$ java -jar parquet-tools.jar meta <parquetfile>
[1] https://github.com/apache/parquet-mr/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
parquet-tools dump: Encoding & Statistics
total_amount:
- DOUBLE SNAPPY DO:0 FPO:4155231 SZ:329324/338501/1.03
[more]... ST:[min: -76.8, max: 1121.3, num_nulls: 0]
dropoff_datetime:
- BINARY SNAPPY DO:0 FPO:3315979 SZ:839131/5540639/6.60
[more]... ST:[no stats for this column]
Use (unix epoch) or partition by timestamp for time series
data.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query VI (ORC)
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, count(*) count
FROM "128mborc"
WHERE pickup_datetime
BETWEEN '2018-09-02T10' AND '2018-09-02T11'
AND partition_0 || partition_1 || partition_2 ||
partition_3
BETWEEN '2018090210' AND '2018090215'
GROUP BY passenger_count
Run time: 3.21 seconds, Data scanned: 300.7MB
Run time: 3.61 seconds, Data scanned: 303.38MB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyzing ORC: orcdumpfile
Spin up a single node/ master EMR Cluster and use the
hive command:
hive --orcfiledump file://<absolutepath>/file.orc
[…]
Column 7: count: 210141 hasNull: false min: -
76.96324157714844 max: 0.0 sum: -
1.5329986951126099E7
Column 8: count: 210141 hasNull: false min:
2018-08-30T00:13:48.573Z max: 2018-08-
30T00:28:49.564Z sum: 5043384
[…]
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log
S3 Athena
Data Catalog
Schema
Lookup
Write table partitions
Glue
ETL with AWS Glue For Frequent Queries
Query data
Read/
Write
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: ETL with AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Zeppelin/ AWS Glue Notebook
https://gist.github.com/steffeng/
5b841a99230ba8377f161f5545
3d49d0
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query VII (repartitioned)
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, count(*) count
FROM "partitioned_by_hour"
WHERE year = 2018
AND month = 9
AND day = 2
AND hour = 10
GROUP BY passenger_count
Run time: 3.21 seconds, Data scanned: 300.7MB
Run time: 2.42 seconds, Data scanned: 2.06MB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Query VIII (aggregated)
What was the distribution of passenger load
on 2nd September 10-11h?
SELECT passenger_count, trip_count
FROM "aggregates_by_hour"
WHERE year = 2018
AND month = 9
AND day = 2
AND hour = 10
Run time: 2.42 seconds, Data scanned: 2.06MB
Run time: 1.85 seconds, Data scanned: 0.37KB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recently announced and relevant...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Photo by Benjamin Davies on Unsplash
I applied these simple
tricks when storing data
for Amazon Athena and
you won‘t believe what
happened next...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Measure. Then optimize.
There‘s no silver bullet.
Photo by Cesar Carlevarino Aragon on Unsplash
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize for Cost and Performance 1/2
• Use Athena in the region of your buckets.
• Compress your data for less storage & query cost.
• Use LIMIT in queries for faster results.
• Partition your data based on data access patterns.
• Use partitions in your queries.
• Add partitions by crawling or S3 triggers.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize for Cost and Performance 2/2
• Columnar formats as ORC & parquet reduce scanned
data: faster, less cost
• Pick format depending on data, access patterns, clients
• Inspect/ verify the resulting files
• Create aggregates for frequent queries
• Shorten turnaround times for Glue job development:
• Use a provisioned development endpoint
• Use small subset of your data (think KB!)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The AWS Free Tier allows you to
get hands on experience with AWS
Glue and S3. Try it today!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Questions?
Ask the Architect
downstairs!

Mais conteúdo relacionado

Mais procurados

Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Amazon Web Services
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
 
Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks
Amazon Elasticsearch Service Deep Dive - AWS Online Tech TalksAmazon Elasticsearch Service Deep Dive - AWS Online Tech Talks
Amazon Elasticsearch Service Deep Dive - AWS Online Tech TalksAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Introducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksIntroducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksAmazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Amazon Web Services
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 

Mais procurados (20)

Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
 
Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks
Amazon Elasticsearch Service Deep Dive - AWS Online Tech TalksAmazon Elasticsearch Service Deep Dive - AWS Online Tech Talks
Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS Analytics
 
Using Graph Databases
Using Graph DatabasesUsing Graph Databases
Using Graph Databases
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Introducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksIntroducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech Talks
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SF
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Graph and Amazon Neptune
Graph and Amazon NeptuneGraph and Amazon Neptune
Graph and Amazon Neptune
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 

Semelhante a Query your data in S3 with SQL and optimize for cost and performance

Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Amazon Web Services
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeAmazon Web Services
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GAAmazon Web Services
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Amazon Web Services
 
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Amazon Web Services
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL PipelinesAmazon Web Services
 

Semelhante a Query your data in S3 with SQL and optimize for cost and performance (20)

Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GA
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018
Optimizing Performance in Amazon S3 (STG367-R2) - AWS re:Invent 2018
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 

Mais de AWS Germany

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAWS Germany
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...AWS Germany
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...AWS Germany
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSAWS Germany
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerAWS Germany
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for AlexaAWS Germany
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureAWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWSAWS Germany
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS AWS Germany
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Germany
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data DesignAWS Germany
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crashAWS Germany
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultAWS Germany
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECSAWS Germany
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the UnionAWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailAWS Germany
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductAWS Germany
 
Introduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainIntroduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainAWS Germany
 

Mais de AWS Germany (20)

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to Product
 
Introduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainIntroduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI Toolchain
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Query your data in S3 with SQL and optimize for cost and performance

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Steffen Grunwald, AWS Solutions Architect, @steffeng AWS Pop-up Loft Berlin, 17. October 2018 Query your data in S3 with SQL and optimize for cost and performance
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What you will learn from this Session • Benefits of raw Data in Amazon Simple Storage Service • Query on S3 with Amazon Athena • Optimize your Data Structure • Compression • Partitioning • Columnar Formats • Derive Views from raw Data for frequent Queries
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Application: New York Taxi Data Ingestion Amazon Kinesis Streams Amazon Kinesis Analytics Amazon Kinesis Streams AWS Lambda Amazon CloudWatch Amazon Kinesis Firehose Amazon QuickSight AWS Glue Amazon S3 Amazon Athena Instance
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of raw Data in Amazon Simple Storage Service (S3) • Highly durable and cost-effective object store • Limitlessly scalable • Pay for what you use - in GB per month • Decouple storage from compute • Widely supported API by many consumers • Well integrated into other AWS services Use S3 as long term storage to answer yet unknown questions of tomorrow.
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ingest Data with Amazon Kinesis Firehose • Stores stream of records as files in a bucket • Path: <Optional Prefix> + "YYYY/MM/DD/HH“ (Ingestion Time, UTC) • Optionally compress (GZIP, ZIP, Snappy) • Optionally store as columnar format (ORC, Parquet) • Optionally transform records with AWS Lambda Amazon Kinesis Firehose Amazon S3 Bucket
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query Data Directly from Amazon S3 • No loading of data • Query data in its raw format • No Extract, Transform, and Load (ETL) required • Stream data directly from Amazon S3
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Presto SQL • ANSI SQL compliant • Complex joins, nested queries & window functions • Complex data types (arrays, structs, maps) • Partitioning of data by any key • date, time, custom keys • Presto built-in functions
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Supports Multiple Data Formats • Text files, e.g., CSV, raw logs • Apache Web Logs, TSV files • JSON (simple, nested) • Compressed files • Columnar formats such as Parquet & ORC • AVRO support
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena is Cost Effective • Pay per query • $5 per TB scanned from S3 • DDL Queries and failed queries are free
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo: Query files from Amazon Kinesis Firehose with Amazon Athena and AWS Glue
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Example Data • NYC Taxi & Limousine Commission rides • Data is generated by kinesis-taxi-stream- producer available at [1]: java -jar kinesis-taxi-stream-producer.jar -speedup 400 -statisticsFrequency 10000 -stream nyctlc-ingestion –noWatermark -region eu-central-1 -adaptTime ingestion • ~2GB/h of raw data, 11 days, 487 GB total [1] https://github.com/aws-samples/flink-stream- processing-refarch
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Test Setup: Ingesting Data with different Settings Amazon Kinesis Streams Amazon S3 Instance Firehose (gzip) Firehose (raw) Firehose (orc) Firehose (parquet) (max Amazon Kinesis Firehose buffering hints: 128MB & 900s)
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Glen Noble on Unsplash
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query I Show some rides on 2nd September 10-11h: SELECT * FROM "128mb" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' LIMIT 10 Run time: 3.53 seconds, Data scanned: 4.62GB
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query II (gzip) Show some rides on 2nd September 10-11h: SELECT * FROM "128mbgz" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' LIMIT 10 Run time: 3.53 seconds, Data scanned: 4.62GB Run time: 2.45 seconds, Data scanned: 303.04KB gzip reduces 487GB to 76GB.
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query III (without LIMIT 10) What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, count(*) count FROM "128mbgz" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' GROUP BY passenger_count Run time: 50.36 seconds, Data scanned: 76.5GB
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Tang Junwen on Unsplash
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Partitions to the Rescue AWS Glue crawler adds partitions based on file prefixes/ dirs
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query IV What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, count(*) count FROM "128mbgz" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' AND partition_0 || partition_1 || partition_2 || partition_3 BETWEEN '2018090210' AND '2018090215' GROUP BY passenger_count Run time: 27.6 seconds, Data scanned: 25.5GB Run time: 5.59 seconds, Data scanned: 1.77GB
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log S3 Athena Data Catalog Schema Lookup Create table partitions Glue Crawl Partitions with AWS Glue Query data Why? Just schedule the crawler, no need to code! Deals with schema evolution. Crawl data
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use Hive-style File Format in S3 Move/ copy: YYYY/MM/DD/HH/file year=YYYY/month=MM/day=DD/hours=HH/file Make Athena reload partitions by: msck repair table Why? Format easy to create on write, easy to move.
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log S3 Athena Data Catalog Schema Lookup Add table partition Lambda Creating Partitions with AWS Lambda Query data New File Trigger Why? Add partitions instantly, just AWS Lambda cost.
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Populate Partitions if paths are known Issue Statements with Amazon Athena: ALTER TABLE mytable ADD PARTITION (year='2015',month='01',day='01') LOCATION 's3://[...]/2015/01/01/' Why? Easy for predictable paths. Can be prepopulated.
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Columnar Formats
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Last_Name Label Le Fleming Lisciandro Minghi Jime Age 34 25 45 63 22 Gender Fem Fem Fem Mal Mal Flat File Sample Layout First_Name Tootsie Miriam Blakeley Ernst Brew
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Last_Name Label Le Fleming Lisciandro Minghi Jime MIN: Jime MAX: Minghi Age 34 25 45 63 22 MIN: 22 MAX: 63 Gender Fem Fem Fem Mal Mal MIN: Fem MAX: Mal First_Name Tootsie Miriam Blakeley Ernst Brew MIN: Blakeley MAX: Tootsie Columnar Formats Layout (Parquet & ORC)
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Last_Name Label Le Fleming Lisciandro Minghi Jime MIN: Jime MAX: Minghi Age 34 25 45 63 22 MIN: 22 MAX: 63 Gender Fem Fem Fem Mal Mal MIN: Fem MAX: Mal First_Name Tootsie Miriam Blakeley Ernst Brew MIN: Blakeley MAX: Tootsie Benefit 1: Predicate Pushdown SELECT * FROM ... WHERE Age > 30
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Last_Name Label Le Fleming Lisciandro Minghi Jime MIN: Jime MAX: Minghi Age 34 25 45 63 22 MIN: 22 MAX: 63 Gender Fem Fem Fem Mal Mal MIN: Fem MAX: Mal First_Name Tootsie Miriam Blakeley Ernst Brew MIN: Blakeley MAX: Tootsie Benefit 2: Projection Pushdown/ Column Pruning SELECT First_Name FROM ... WHERE Age > 30
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefit 3: Compression & Encoding • RLE (& Bit Packing) for numbers • Dictionary for string repetitions (+RLE) • Delta encoding for increasing numbers • Delta Strings (for string with a identical prefix) • Plain encoding for varied strings https://github.com/apache/parquet-format/blob/master/Encodings.md
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. More on Dictionary Encoding • Builds list of unique strings, assigns numeric ID to each • If the dictionary size over 1MB (configurable) or number of distinct values too high, will fall back to Plain encoding. • The data itself is later represented as numbers and is further encoded using RLE https://github.com/apache/parquet-format/blob/master/Encodings.md
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo: Parquet/ ORC with Amazon Kinesis Firehose (new!)
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query V (parquet) What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, count(*) count FROM "128mbparquet" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' AND partition_0 || partition_1 || partition_2 || partition_3 BETWEEN '2018090210' AND '2018090215' GROUP BY passenger_count Run time: 5.59 seconds, Data scanned: 1.77GB Run time: 3.21 seconds, Data scanned: 300.7MB
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analyzing Parquet File • parquet-tools • head – view data in file • meta – get metadata summary • dump -d -n – get detailed metadata down to page level stats included
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Schema Information Row Count Total Byte Size Size in Bytes Value Count Encoding Download and build [1]. $ java -jar parquet-tools.jar meta <parquetfile> [1] https://github.com/apache/parquet-mr/
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. parquet-tools dump: Encoding & Statistics total_amount: - DOUBLE SNAPPY DO:0 FPO:4155231 SZ:329324/338501/1.03 [more]... ST:[min: -76.8, max: 1121.3, num_nulls: 0] dropoff_datetime: - BINARY SNAPPY DO:0 FPO:3315979 SZ:839131/5540639/6.60 [more]... ST:[no stats for this column] Use (unix epoch) or partition by timestamp for time series data.
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query VI (ORC) What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, count(*) count FROM "128mborc" WHERE pickup_datetime BETWEEN '2018-09-02T10' AND '2018-09-02T11' AND partition_0 || partition_1 || partition_2 || partition_3 BETWEEN '2018090210' AND '2018090215' GROUP BY passenger_count Run time: 3.21 seconds, Data scanned: 300.7MB Run time: 3.61 seconds, Data scanned: 303.38MB
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analyzing ORC: orcdumpfile Spin up a single node/ master EMR Cluster and use the hive command: hive --orcfiledump file://<absolutepath>/file.orc […] Column 7: count: 210141 hasNull: false min: - 76.96324157714844 max: 0.0 sum: - 1.5329986951126099E7 Column 8: count: 210141 hasNull: false min: 2018-08-30T00:13:48.573Z max: 2018-08- 30T00:28:49.564Z sum: 5043384 […]
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log S3 Athena Data Catalog Schema Lookup Write table partitions Glue ETL with AWS Glue For Frequent Queries Query data Read/ Write
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo: ETL with AWS Glue
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Zeppelin/ AWS Glue Notebook https://gist.github.com/steffeng/ 5b841a99230ba8377f161f5545 3d49d0
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query VII (repartitioned) What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, count(*) count FROM "partitioned_by_hour" WHERE year = 2018 AND month = 9 AND day = 2 AND hour = 10 GROUP BY passenger_count Run time: 3.21 seconds, Data scanned: 300.7MB Run time: 2.42 seconds, Data scanned: 2.06MB
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Query VIII (aggregated) What was the distribution of passenger load on 2nd September 10-11h? SELECT passenger_count, trip_count FROM "aggregates_by_hour" WHERE year = 2018 AND month = 9 AND day = 2 AND hour = 10 Run time: 2.42 seconds, Data scanned: 2.06MB Run time: 1.85 seconds, Data scanned: 0.37KB
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recently announced and relevant...
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Photo by Benjamin Davies on Unsplash I applied these simple tricks when storing data for Amazon Athena and you won‘t believe what happened next...
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Measure. Then optimize. There‘s no silver bullet. Photo by Cesar Carlevarino Aragon on Unsplash
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimize for Cost and Performance 1/2 • Use Athena in the region of your buckets. • Compress your data for less storage & query cost. • Use LIMIT in queries for faster results. • Partition your data based on data access patterns. • Use partitions in your queries. • Add partitions by crawling or S3 triggers.
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimize for Cost and Performance 2/2 • Columnar formats as ORC & parquet reduce scanned data: faster, less cost • Pick format depending on data, access patterns, clients • Inspect/ verify the resulting files • Create aggregates for frequent queries • Shorten turnaround times for Glue job development: • Use a provisioned development endpoint • Use small subset of your data (think KB!)
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The AWS Free Tier allows you to get hands on experience with AWS Glue and S3. Try it today!
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Questions? Ask the Architect downstairs!