SlideShare uma empresa Scribd logo
1 de 40
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Breakthroughs
Process and Query Data In Place with
Amazon S3 & Glacier
R a h u l B h a r t i a R a s h i m G u p t a
P r i n c i p a l P r o d u c t M a n a g e r P r i n c i p a l P r o d u c t M a n a g e r
A m a z o n S 3 A m a z o n G l a c i e r
S T G 3 1 3
N o v e m b e r 2 9 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to expect from the session
1. Introduction to Amazon S3 Select and Amazon Glacier Select
2. Dive into Use-cases
3. Key features and demos
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unmatched durability,
availability, and scalability
Best security, compliance, and audit
capability
Object-level control
at any scale
Business insight into
your data
Twice as many partner
integrations
Most ways to bring
data in
Building a data lake with Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena Amazon Redshift
Spectrum
Amazon EMR AWS Glue
With a wide variety of in-place tools…
Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: Compute scales based…
on object size instead of the amount of data you want to process
DATA COMPUTE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: All of these tools…
retrieve a lot of data they don’t need and
do the heavy lifting
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: You need to restore…
entire object from Amazon Glacier to Amazon S3
and then use it.
Amazon
S3
Amazon
Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing…
Amazon S3 Select and Amazon Glacier Select
Select subset of data from an object based on a SQL expression
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Available as an API—no
infrastructure or administration
Faster performance as compared
to doing it yourself
Pay as you go. The less you
retrieve the more you save.
Simple, faster, and cheaper!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple to use
Standard SQL expression
Familiar
Work and scales like GET requests
Integrated
AWS SDK and Presto
(others coming soon)
Amazon S3 Select
Select contents from object instead of retrieving the object
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select
Output
Format: delimited text (CSV,
TSV), JSON …
Clauses Data types Operators Functions
Select String Conditional String
From Integer, Float, Decimal Math Cast
Where Timestamp Logical Math
Boolean String (Like, ||) Aggregate
Input
Format: delimited text (CSV,
TSV), JSON …
Compression: GZIP …
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Simple pattern matches
…get-object …object… | awk -F ’{ if($4=="x") print $1}’
...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Serverless applications
Amazon
S3
AWS
Lambda
Amazon
SNS
S3
Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
200 seconds and 11.2 cents
# Download and process all keys
for key in src_keys:
response = s3_client.get_object(Bucket=src_bucket,
Key=key)
contents = response['Body'].read()
for line in contents.split('n')[:-1]:
line_count +=1
try:
data = line.split(',')
srcIp = data[0][:8]
….
95 seconds and costs 2.8 cents
# Select IP Address and Keys
for key in src_keys:
response = s3_client.select_object_content
(Bucket=src_bucket, Key=key, expression =
SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM
s3object as obj)
contents = response['Body'].read()
for line in contents:
line_count +=1
try:
….
A f t e rB e f o r e
Amazon S3 Select: Serverless MapReduce
2X Faster at 1/5 of the cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Up to 400% Faster
Up to 80% Cheaper
Amazon S3 Select: Accelerating Big Data
Amazon S3
Before:
Amazon S3
S3 Select
After:
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO: Amazon S3 Select with Presto
Works with your existing Hive Metastore
Automatically converts predicates into S3 Select requests
Amazon S3
S3 Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A f t e rB e f o r e
Amazon S3 Select: Accelerating big data
5X Faster with 1/40 of the CPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena Amazon EMR Amazon Redshift
Spectrum
Amazon S3 Select: Will be supported by…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Preview starts today
• Formats: CSV, JSON
• Compression: GZIP
• Encryption: None
• Encoding: UTF-8
• Integration: AWS SDK for Java and Python and Presto Connector
• Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore
Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier
• Extremely low-cost archive storage service, starting at $0.004 GB/mo
• Allows you to retrieve data from minutes to hours
• 99.999999999% of durability (five to six orders of magnitude higher
than two copies of tape)
• All data is encrypted at rest
• Features: compliance, data management, cost management, audit
logging
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Choice of storage classes
Active data Archive dataInfrequently accessed data
Milliseconds Minutes to hoursMilliseconds
From 2.1¢-GB/mo. 0.4¢-GB/mo.1.25¢-GB/mo.
Amazon S3 Standard
Amazon S3 Standard–
Infrequent Access
Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Retrievals using Amazon Glacier
Expedited Standard Bulk
Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours
Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB
Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests
• Three flexible and powerful retrieval options to access any of your Amazon
Glacier data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple SQL Expression
SELECT and WHERE
Familiar semantics
Work and scales like RESTORE requests
Integrated
AWS SDK and CLI
Amazon Glacier Select
Select relevant contents from object instead of restoring entire object
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Glacier API
Data directly uploaded to Amazon
Glacier
Using S3 API
For data that is lifecycled to Amazon Glacier
from S3
How to use Amazon Glacier Select?
Two ways to use Amazon Glacier Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Object id
How to use Amazon Glacier Select?
Current restore-object API arguments
Tier
SQL query
New (optional) restore-object API arguments to use Glacier Select
Output S3 location SNS topic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Glacier Select
…restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’
...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Glacier Select works
App Amazon Glacier Amazon S3Glacier-Select (ArchiveId, SQL, Tier,
S3 bucket to write output)
200 OK
Read data and
Perform filtering
Write output to S3
Notify app using SNS, that output ready
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Billing Inquiries Auditing/Compliance Building Training Models
Amazon Glacier Select use cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select
Input
Format: delimited text (CSV,
TSV, PSV, etc.)
Encryption: SSE-KMS, SSE-S3
Output
Format: delimited text (CSV,
TSV, PSV, etc.)
Clauses Data types Operators Functions
Select String Conditional String
From Integer, Float, Decimal Math Cast
Where Timestamp Logical
Boolean String (Like, ||)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: Glacier Select
App
Glacier Select
Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select pricing
• Data scanned: per GB bill for size of data read
• Data returned: per GB charge based on amount of output data returned to S3
• Request cost: charge for each Glacier Select request
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Coming in 2018: Amazon Athena with
Amazon Glacier Select
Amazon Glacier
• Query data in Amazon Glacier directly from Amazon Athena
• Enables customers to use Amazon Glacier as a data source for their SQL queries in Amazon
Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select: GA starts today
• Formats: CSV, Any delimiter separated file
• Encryption: SSE- KMS, SSE-S3
• Encoding: UTF-8
• Integration: AWS SDK, CLI, Athena integration (expected 2018)
• Availability: All commercial regions where Amazon Glacier is launched
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple Faster Cheaper
Now: Your data lake on AWS
Amazon
Glacier
Amazon
S3
Amazon Redshift
Spectrum
Amazon Athena Amazon EMR
AWS
Lambda
ISVs and Custom
Applications
SELECT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STG303 – Deep Dive on Amazon Glacier – Thurs, 1:45 PM
STG312 – Best Practices for Building a Data Lake in Amazon S3 & Amazon
Glacier – Thurs, 3:15 PM
Learn more…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Q&A
Amazon S3 Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Mais conteúdo relacionado

Mais procurados

Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Building a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsBuilding a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsAmazon Web Services
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...Amazon Web Services
 
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...Amazon Web Services
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
DEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationDEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationAmazon Web Services
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
 
Protecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesProtecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueAmazon Web Services
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
AWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management DemoAWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management DemoAmazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
GAM306_Building a Lake of Wisdom
GAM306_Building a Lake of WisdomGAM306_Building a Lake of Wisdom
GAM306_Building a Lake of WisdomAmazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 

Mais procurados (20)

Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Building a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsBuilding a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT Workloads
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...
Data Security in the Cloud Demystified – Policies, Protection, and Tools for ...
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
DEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationDEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormation
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
Protecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and FeaturesProtecting Your Data- AWS Security Tools and Features
Protecting Your Data- AWS Security Tools and Features
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS Glue
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
AWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management DemoAWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management Demo
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
GAM306_Building a Lake of Wisdom
GAM306_Building a Lake of WisdomGAM306_Building a Lake of Wisdom
GAM306_Building a Lake of Wisdom
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
GAM305_Automating Mother Nature
GAM305_Automating Mother NatureGAM305_Automating Mother Nature
GAM305_Automating Mother Nature
 

Semelhante a Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017

Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksDeep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksAmazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Amazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...AWS Germany
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GAAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

Semelhante a Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017 (20)

Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksDeep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GA
 
Serverless Architectures.pdf
Serverless Architectures.pdfServerless Architectures.pdf
Serverless Architectures.pdf
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Breakthroughs Process and Query Data In Place with Amazon S3 & Glacier R a h u l B h a r t i a R a s h i m G u p t a P r i n c i p a l P r o d u c t M a n a g e r P r i n c i p a l P r o d u c t M a n a g e r A m a z o n S 3 A m a z o n G l a c i e r S T G 3 1 3 N o v e m b e r 2 9 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1. Introduction to Amazon S3 Select and Amazon Glacier Select 2. Dive into Use-cases 3. Key features and demos
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrations Most ways to bring data in Building a data lake with Amazon S3
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue With a wide variety of in-place tools… Amazon S3
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: Compute scales based… on object size instead of the amount of data you want to process DATA COMPUTE
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to restore… entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3 Amazon Glacier
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing… Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Available as an API—no infrastructure or administration Faster performance as compared to doing it yourself Pay as you go. The less you retrieve the more you save. Simple, faster, and cheaper!
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple to use Standard SQL expression Familiar Work and scales like GET requests Integrated AWS SDK and Presto (others coming soon) Amazon S3 Select Select contents from object instead of retrieving the object
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. A f t e rB e f o r e Amazon S3 Select: Serverless MapReduce 2X Faster at 1/5 of the cost
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Up to 400% Faster Up to 80% Cheaper Amazon S3 Select: Accelerating Big Data Amazon S3 Before: Amazon S3 S3 Select After:
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO: Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A f t e rB e f o r e Amazon S3 Select: Accelerating big data 5X Faster with 1/40 of the CPU
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Amazon EMR Amazon Redshift Spectrum Amazon S3 Select: Will be supported by…
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Preview starts today • Formats: CSV, JSON • Compression: GZIP • Encryption: None • Encoding: UTF-8 • Integration: AWS SDK for Java and Python and Presto Connector • Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier • Extremely low-cost archive storage service, starting at $0.004 GB/mo • Allows you to retrieve data from minutes to hours • 99.999999999% of durability (five to six orders of magnitude higher than two copies of tape) • All data is encrypted at rest • Features: compliance, data management, cost management, audit logging
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Choice of storage classes Active data Archive dataInfrequently accessed data Milliseconds Minutes to hoursMilliseconds From 2.1¢-GB/mo. 0.4¢-GB/mo.1.25¢-GB/mo. Amazon S3 Standard Amazon S3 Standard– Infrequent Access Amazon Glacier
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Retrievals using Amazon Glacier Expedited Standard Bulk Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests • Three flexible and powerful retrieval options to access any of your Amazon Glacier data
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple SQL Expression SELECT and WHERE Familiar semantics Work and scales like RESTORE requests Integrated AWS SDK and CLI Amazon Glacier Select Select relevant contents from object instead of restoring entire object
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier API Data directly uploaded to Amazon Glacier Using S3 API For data that is lifecycled to Amazon Glacier from S3 How to use Amazon Glacier Select? Two ways to use Amazon Glacier Select
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Object id How to use Amazon Glacier Select? Current restore-object API arguments Tier SQL query New (optional) restore-object API arguments to use Glacier Select Output S3 location SNS topic
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier Select …restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’ ...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Glacier Select works App Amazon Glacier Amazon S3Glacier-Select (ArchiveId, SQL, Tier, S3 bucket to write output) 200 OK Read data and Perform filtering Write output to S3 Notify app using SNS, that output ready
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Billing Inquiries Auditing/Compliance Building Training Models Amazon Glacier Select use cases
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Input Format: delimited text (CSV, TSV, PSV, etc.) Encryption: SSE-KMS, SSE-S3 Output Format: delimited text (CSV, TSV, PSV, etc.) Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Boolean String (Like, ||)
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo: Glacier Select App Glacier Select Amazon Glacier
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select pricing • Data scanned: per GB bill for size of data read • Data returned: per GB charge based on amount of output data returned to S3 • Request cost: charge for each Glacier Select request
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Coming in 2018: Amazon Athena with Amazon Glacier Select Amazon Glacier • Query data in Amazon Glacier directly from Amazon Athena • Enables customers to use Amazon Glacier as a data source for their SQL queries in Amazon Athena
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select: GA starts today • Formats: CSV, Any delimiter separated file • Encryption: SSE- KMS, SSE-S3 • Encoding: UTF-8 • Integration: AWS SDK, CLI, Athena integration (expected 2018) • Availability: All commercial regions where Amazon Glacier is launched
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple Faster Cheaper Now: Your data lake on AWS Amazon Glacier Amazon S3 Amazon Redshift Spectrum Amazon Athena Amazon EMR AWS Lambda ISVs and Custom Applications SELECT
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STG303 – Deep Dive on Amazon Glacier – Thurs, 1:45 PM STG312 – Best Practices for Building a Data Lake in Amazon S3 & Amazon Glacier – Thurs, 3:15 PM Learn more…
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q&A Amazon S3 Amazon Glacier
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!