SlideShare a Scribd company logo
1 of 42
Download to read offline
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Optimizing data lakes with Amazon S3
John Mallory
Storage Business Development Manager
AWS
S T G 3 0 2
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
125+ million players
Data provides a constant feedback loop
for game designers
Up-to-the-minute analysis of gamer
satisfaction to drive gamer engagement
Resulting in the most popular
game played in the world
Fortnite
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Data is a strategic asset for every organization
The world’s most valuable
resource is no longer oil, but
data.*
*Copyright: The Economist, 2017,DavidParkins
“
”
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Finding value in data is a journey
Business monitoring
Business insights
New business opportunity
Business optimization
Business transformation
Evolving tools and methods
AI/MLSQL query
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Why use AWS for big data & analytics?
Agility Scalability
Get to insights faster
Broadest and deepest
capabilities
Low cost
Data migrations made easy
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
More data lakes and analytics than anywhere else
More than 10,000data lakesonAWS
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Defining the AWS data lake
Data lakes provide:
Relational and nonrelational data
Scale out to Amazon EBS
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low-cost storage and analytics
OLTP ERP CRM LoB
Data warehouse
Business
intelligence
Data lake
100110000100101011100101010
111001010100001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
learning
DW queries Big data
processing
Interactive Real time
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Data lake on AWS
Catalog & search Access & user interfaces
Data ingestion
Analytics & serving
S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
CloudTrail
Manage & secure
AWS
IAM
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
KinesisData
Firehose
AWS Direct
Connect
AWS DatabaseMigration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
Central storage
Scalable, secure, cost-
effective
AWS
Glue
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
User-definedfunctions
• Bring your own functions & code
• Execute without provisioning servers
Processing and querying in place
Fully managed process & query
• Catalog, transform & query data in Amazon S3
• No physical instances to manage
Lambda function
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 is the best place for data lakes
Most ways to
bring data in
Best security,
compliance,
and audit
capabilities
Object-level
controls
Unmatched
durability,
availability,
and scalability
Business
insights
into your data
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Optimize costs with data tiering
Hot
Cold
Amazon S3
Standard
Amazon S3
Standard-
Infrequent Access
Amazon S3
Glacier
HDFS ✓ Use EMR/Hadoop with local HDFS for
hottest data sets
✓ Store cooler data in Amazon S3 and
cold in Amazon S3 Glacier to reduce
costs
✓ Use Amazon S3 analytics to optimize
tiering strategy
Amazon S3 Analytics
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Your choice of Amazon S3 storage classes
Access frequencyFrequent Infrequent
• Active, frequently
accessed data
• Milliseconds access
• >3 AZ
• From: $0.0210/GB
• Data with changing access
pattern
• Milliseconds access
• >3 AZ
• From: $0.0210 to
$0.0125/GB
• Monitoring fee per obj.
• Min. storage duration
• Infrequently accessed
data
• Milliseconds access
• >3 AZ
• From: $0.0125/GB
• Retrieval fee per GB
• Min. storage duration
• Min. object size
S3 Standard S3 Standard-IA S3 One Zone-IA S3 Glacier
• Recreatable less accessed
data
• Milliseconds access
• 1 AZ
• From: $0.0100/GB
• Retrieval fee per GB
• Min. storage duration
• Min. object size
• Archive data
• Minutes to hours
access
• >3 AZ
• From: $0.0040/GB
• Retrieval fee
per GB
• Min. storage duration
• Min. object size
S3 Intelligent-
Tiering
S3 Glacier Deep
Archive
• Archive data
• Hours access
• >3 AZ
• From: $0.00099/GB
• Retrieval fee per GB
• Min. storage duration
• Min. object size
N E W ! N E W !
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 Intelligent-Tiering
automates cost savings
Automatically optimizes storage costs for
data with changing access patterns
Moves objects between two storage tiers:
• Frequent access tier
• Infrequent access tier
Monitors access patterns and auto-tiers on
granular object level
Milliseconds access, >3 AZ, monitoring fee
per object, minimum storage duration
NEW!
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 Glacier Deep Archive NEW!
No tape to
manage
$0.00099/GB/month
Less than 1/4 the cost of
Amazon S3 Glacier
Designed for
119s durability
Recover data in
hours
Lowest cost storage available in the cloud
C o m i n g s o o n
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
A data lake needs to
accommodate a wide
variety of concurrent data
sources
Rapidly ingest all data sources
IoT, sensor data, clickstreamdata,
social media feeds, streaming logs
Oracle, MySQL, MongoDB, DB2,
SQL Server, Amazon RDS
On-premises ERP, mainframes,
lab equipment, NAS storage
Offline sensor data, NAS,
On-premises Hadoop
On-premises data lakes, EDW,
large-scale data collection
Ingest
methods
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
AWS Transfer for SFTP NEW!
Fully managed service enabling transfer
of data over SFTP while stored in Amazon S3
Seamless migration
of existing workflows
Native integration
with AWS services
Simple
to use
Cost
effective
Secure and compliantFully managed
in AWS
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
AWS
integrated
AWS
Transfer service that simplifies, automates, and accelerates data movement
Transfers up
to 10 Gbps
per agent
Pay as
you go
Secure and
reliable
transfers
Replicate data to AWS for
business continuity
Transfer data for timely
in-cloud analysis
Migrate active application
data to AWS
Combines the speed and reliability of network acceleration software
with the cost-effectiveness of open source tools
Simple data
movement to
Amazon S3 or
Amazon EFS
AWS DataSync NEW!
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Process data in place
Amazon S3
Amazon Athena Amazon Redshift
Spectrum
Amazon SageMaker AWS Glue
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 Select
Select a subset of your object’s data using a SQL expression
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Improved performance for data lakes
As customers store larger and larger datasets in Amazon S3,
Amazon S3 Select offers up to a 400% performance improvement
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 Select enhancements NEW!
Now supports:
CSV, JSON, JSON arrays, and Parquet formats
GZIP, BZIP2, and Snappy compression
Integrated with Spark, Hive, and Presto on Amazon EMR
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon FSx for Lustre
For compute-intensive data processing
use cases like HPC or machine learning
Raw data stored in Amazon S3 is loaded to
FSx for Lustre for processing
Output of processing returned to
Amazon S3 for retention
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon FSx for Lustre performance
Massively scalable performance
100+ GB/s throughput | Millions of IOPS |
Consistent submillisecond latencies
Parallel file system Supports hundreds of
thousands of cores
SSD-based
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Choosing the right data formats
There is no such thing as the “best” data format
• All involve tradeoffs, depending on workload & tools
• CSV, TSV, JSON are easy but not efficient
• Compress & store or archive as raw input
• Columnar compressed are generally preferred
• Parquet or ORC
• Smaller storage footprint = lower cost
• More efficient scan & query
• Row-oriented (AVRO) good for full data scans
• Organize into partitions
• Coalescing to larger partitions over time
Key considerations are cost, performance, & support
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Data prep is ~80% of data lake work
Building training sets
Cleaning and organizing data
Collecting datasets
Mining data for patterns
Refining algorithms
Other
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Set up a catalog, ETL, and data prep
with AWS Glue
Serverless provisioning, configuration, and
scaling to run your ETL jobs on Apache Spark
Pay only for the resources used for jobs
Crawl your data sources, identify data
formats, and suggest schemas and
transformations
Automates the effort in building, maintaining
and running ETL jobs
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Security challenges with data lakes
Data challenges
• Controlling access to data
• Data masking, row / column / cell level
encryption, key management
• Data loss / exfiltration
• Loss of data integrity
• Data provenance
• Compliance requirements (GDPR
and others)
Management challenges
• Central administration
• Federated authentication,
typically with Active Directory
• Role-based access control (RBAC)
• Centralized audit
• End-to-end data protection (at-
rest and in-transit)
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
AWS helps you secure
Compliance
AWS Artifact
Amazon Inspector
AWS CloudHSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
Amazon VPC
Encryption
AWS Certification Manager
AWS Key Management Service
Encryption at rest
Encryption in transit
Bring your own keys, HSM
support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customers need multiple levels of security, identity and access management, encryption, and
compliance to secure their data lake
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Data lake security
• Data storage
• Metadata
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Control access to data
Configure Amazon S3 permissions
• Implement your access control matrix using IAM
policies
• Use S3 bucket policies for easy cross-account data
sharing
• Limit role-based access from an Amazon EMR
cluster’s Amazon Elastic Compute Cloud (Amazon
EC2) instance profile
• Authorize access from other tools such as Amazon
Redshift using IAM roles
IAM principals Amazon EMR Amazon Redshift
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Block public access to Amazon S3
Amazon S3 provides four settings
• BlockPublicAcls – Rejects new public object or bucket ACLs
• IgnorePublicAcls – Ignores existing public object or bucket ACLs
• BlockPublicPolicy – Rejects new public bucket access policy
• RestrictPublicBuckets – Restricts access to only AWS services and authorized users
within the bucket owner's account
But, what is “public”?
• Public object (or bucket) ACL → Grants permissions to members of the
predefined AllUsers or AuthenticatedUsers groups (grantees)
• Public bucket policy → Doesn’t grant permissions to only fixed values in Principal and
Condition elements
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Amazon S3 Object Lock NEW!
Immutable Amazon S3 objects
• Write once read many (WORM) protections for Amazon S3 objects
• Object or bucket control of WORM & retention attributes
Retention managementcontrols
• Define retention periods in your app or with bucket-level defaults
• objects locked for the duration of the retention period
• Support for legal hold scenarios
Data protection and compliance
• Assessedfor use in SEC 17a-4, CFTC, and FINRA environments
• Extra protection against accidental or malicious delete
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Metadata security
AWS Glue Data Catalog
• Apache Hive metastorecompatible
• Track data evolution using schema versioning
• Integrates with Hive, Spark, Presto, Amazon
Athena and Amazon Redshift Spectrum
• Use crawlers classify your data in one central list
that is searchable
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Metadata security
Key learnings
• Create and maintain centralized data catalog
• Enable cross account access
• Use IAM policies to control catalog access—similar to
Amazon S3 bucket policies
• Encrypt metadata in AWS Glue Data Catalog
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
AWS Glue Data Catalog resource policies
• Fine-grained access control to Data Catalog using IAM policies
• Restrict what they can view and query
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configureand enforce
security and compliance
policies
4
Make data available
for analytics
5
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Building data lakes can still take months
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Enforce security policies across
multiple services
Gain and manage new insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
How it works
© 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T
Three simple steps to an AWS data lake
Remove data silos
Aggregate data
Better agility
More data = > Insights
Know what you have
Better datamanagement
Quicker time to results
Higher quality data
Extract value from data
Analyze & report on data
Apply machine learning
Visualize & consume results
Amazon ingest & storage
Amazon S3, Amazon S3 Glacier,
AWS Sync,
AWS Storage Gateway,
AWS Snow Family,
Amazon Kinesis
AWS Glue
Crawl, discover & catalogdata
ETL data
Amazon analytics & ML
Amazon Athena, EMR,
Amazon Redshift,Amazon
SageMaker, Amazon Rekognition,
Amazon EC2 + FSx for Lustre
Collect & centralize Catalog & transform Analytics & insights
Thank you!
S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.
John Mallory
johmallo@amazon.com
S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.

More Related Content

What's hot

CI/CD best practices for building modern applications - MAD310 - New York AWS...
CI/CD best practices for building modern applications - MAD310 - New York AWS...CI/CD best practices for building modern applications - MAD310 - New York AWS...
CI/CD best practices for building modern applications - MAD310 - New York AWS...Amazon Web Services
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Amazon Web Services
 
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...Amazon Web Services
 
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfWhat's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...Amazon Web Services
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
Building ML platforms in Financial Services with serverless technology - FSV2...
Building ML platforms in Financial Services with serverless technology - FSV2...Building ML platforms in Financial Services with serverless technology - FSV2...
Building ML platforms in Financial Services with serverless technology - FSV2...Amazon Web Services
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Amazon Web Services
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Amazon Web Services
 
Building enterprise solutions with blockchain technology - SVC217 - New York ...
Building enterprise solutions with blockchain technology - SVC217 - New York ...Building enterprise solutions with blockchain technology - SVC217 - New York ...
Building enterprise solutions with blockchain technology - SVC217 - New York ...Amazon Web Services
 
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Amazon Web Services
 
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...Amazon Web Services
 
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...Amazon Web Services
 
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Amazon Web Services
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Amazon Web Services
 
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...Amazon Web Services
 
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Amazon Web Services
 
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS Summit
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS SummitThreat detection and mitigation at AWS - SEC301 - Santa Clara AWS Summit
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS SummitAmazon Web Services
 

What's hot (20)

CI/CD best practices for building modern applications - MAD310 - New York AWS...
CI/CD best practices for building modern applications - MAD310 - New York AWS...CI/CD best practices for building modern applications - MAD310 - New York AWS...
CI/CD best practices for building modern applications - MAD310 - New York AWS...
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...
A tale of two customers - Simplified data protection with Veeam, N2WS & AWS -...
 
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfWhat's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
 
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Building ML platforms in Financial Services with serverless technology - FSV2...
Building ML platforms in Financial Services with serverless technology - FSV2...Building ML platforms in Financial Services with serverless technology - FSV2...
Building ML platforms in Financial Services with serverless technology - FSV2...
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
 
Building enterprise solutions with blockchain technology - SVC217 - New York ...
Building enterprise solutions with blockchain technology - SVC217 - New York ...Building enterprise solutions with blockchain technology - SVC217 - New York ...
Building enterprise solutions with blockchain technology - SVC217 - New York ...
 
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
 
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...
Sizing Amazon Elasticsearch Service for your workload - ADB303 - Santa Clara ...
 
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...
Deploy and manage Kubernetes on AWS from your on-premises environment - DEM07...
 
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
 
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...
What's New with Amazon S3, Amazon EFS, and Other AWS Storage Services - STG20...
 
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
 
AWS Loves Startups
AWS Loves StartupsAWS Loves Startups
AWS Loves Startups
 
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS Summit
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS SummitThreat detection and mitigation at AWS - SEC301 - Santa Clara AWS Summit
Threat detection and mitigation at AWS - SEC301 - Santa Clara AWS Summit
 

Similar to Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit

Optimizing data lakes with Amazon S3 - STG302 - New York AWS Summit
Optimizing data lakes with Amazon S3 - STG302 - New York AWS SummitOptimizing data lakes with Amazon S3 - STG302 - New York AWS Summit
Optimizing data lakes with Amazon S3 - STG302 - New York AWS SummitAmazon Web Services
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...Amazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAmazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...Amazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitAmazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 

Similar to Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit (20)

Optimizing data lakes with Amazon S3 - STG302 - New York AWS Summit
Optimizing data lakes with Amazon S3 - STG302 - New York AWS SummitOptimizing data lakes with Amazon S3 - STG302 - New York AWS Summit
Optimizing data lakes with Amazon S3 - STG302 - New York AWS Summit
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What’s new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit

  • 1. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Optimizing data lakes with Amazon S3 John Mallory Storage Business Development Manager AWS S T G 3 0 2
  • 2. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T 125+ million players Data provides a constant feedback loop for game designers Up-to-the-minute analysis of gamer satisfaction to drive gamer engagement Resulting in the most popular game played in the world Fortnite
  • 3. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Data is a strategic asset for every organization The world’s most valuable resource is no longer oil, but data.* *Copyright: The Economist, 2017,DavidParkins “ ”
  • 4. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Finding value in data is a journey Business monitoring Business insights New business opportunity Business optimization Business transformation Evolving tools and methods AI/MLSQL query
  • 5. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Why use AWS for big data & analytics? Agility Scalability Get to insights faster Broadest and deepest capabilities Low cost Data migrations made easy
  • 6. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T More data lakes and analytics than anywhere else More than 10,000data lakesonAWS
  • 7. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Defining the AWS data lake Data lakes provide: Relational and nonrelational data Scale out to Amazon EBS Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low-cost storage and analytics OLTP ERP CRM LoB Data warehouse Business intelligence Data lake 100110000100101011100101010 111001010100001011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine learning DW queries Big data processing Interactive Real time
  • 8. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Data lake on AWS Catalog & search Access & user interfaces Data ingestion Analytics & serving S3 Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS CloudTrail Manage & secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon KinesisData Firehose AWS Direct Connect AWS DatabaseMigration Service Amazon Athena Amazon EMR AWS Glue Amazon Redshift Amazon DynamoDB Amazon QuickSight Amazon Kinesis Amazon Elasticsearch Service Amazon Neptune Amazon RDS Central storage Scalable, secure, cost- effective AWS Glue
  • 9. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T User-definedfunctions • Bring your own functions & code • Execute without provisioning servers Processing and querying in place Fully managed process & query • Catalog, transform & query data in Amazon S3 • No physical instances to manage Lambda function
  • 10. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 is the best place for data lakes Most ways to bring data in Best security, compliance, and audit capabilities Object-level controls Unmatched durability, availability, and scalability Business insights into your data
  • 11. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Optimize costs with data tiering Hot Cold Amazon S3 Standard Amazon S3 Standard- Infrequent Access Amazon S3 Glacier HDFS ✓ Use EMR/Hadoop with local HDFS for hottest data sets ✓ Store cooler data in Amazon S3 and cold in Amazon S3 Glacier to reduce costs ✓ Use Amazon S3 analytics to optimize tiering strategy Amazon S3 Analytics
  • 12. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Your choice of Amazon S3 storage classes Access frequencyFrequent Infrequent • Active, frequently accessed data • Milliseconds access • >3 AZ • From: $0.0210/GB • Data with changing access pattern • Milliseconds access • >3 AZ • From: $0.0210 to $0.0125/GB • Monitoring fee per obj. • Min. storage duration • Infrequently accessed data • Milliseconds access • >3 AZ • From: $0.0125/GB • Retrieval fee per GB • Min. storage duration • Min. object size S3 Standard S3 Standard-IA S3 One Zone-IA S3 Glacier • Recreatable less accessed data • Milliseconds access • 1 AZ • From: $0.0100/GB • Retrieval fee per GB • Min. storage duration • Min. object size • Archive data • Minutes to hours access • >3 AZ • From: $0.0040/GB • Retrieval fee per GB • Min. storage duration • Min. object size S3 Intelligent- Tiering S3 Glacier Deep Archive • Archive data • Hours access • >3 AZ • From: $0.00099/GB • Retrieval fee per GB • Min. storage duration • Min. object size N E W ! N E W !
  • 13. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 Intelligent-Tiering automates cost savings Automatically optimizes storage costs for data with changing access patterns Moves objects between two storage tiers: • Frequent access tier • Infrequent access tier Monitors access patterns and auto-tiers on granular object level Milliseconds access, >3 AZ, monitoring fee per object, minimum storage duration NEW!
  • 14. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 Glacier Deep Archive NEW! No tape to manage $0.00099/GB/month Less than 1/4 the cost of Amazon S3 Glacier Designed for 119s durability Recover data in hours Lowest cost storage available in the cloud C o m i n g s o o n
  • 15. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T A data lake needs to accommodate a wide variety of concurrent data sources Rapidly ingest all data sources IoT, sensor data, clickstreamdata, social media feeds, streaming logs Oracle, MySQL, MongoDB, DB2, SQL Server, Amazon RDS On-premises ERP, mainframes, lab equipment, NAS storage Offline sensor data, NAS, On-premises Hadoop On-premises data lakes, EDW, large-scale data collection Ingest methods
  • 16. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T AWS Transfer for SFTP NEW! Fully managed service enabling transfer of data over SFTP while stored in Amazon S3 Seamless migration of existing workflows Native integration with AWS services Simple to use Cost effective Secure and compliantFully managed in AWS
  • 17. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T AWS integrated AWS Transfer service that simplifies, automates, and accelerates data movement Transfers up to 10 Gbps per agent Pay as you go Secure and reliable transfers Replicate data to AWS for business continuity Transfer data for timely in-cloud analysis Migrate active application data to AWS Combines the speed and reliability of network acceleration software with the cost-effectiveness of open source tools Simple data movement to Amazon S3 or Amazon EFS AWS DataSync NEW!
  • 18. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Process data in place Amazon S3 Amazon Athena Amazon Redshift Spectrum Amazon SageMaker AWS Glue
  • 19. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 Select Select a subset of your object’s data using a SQL expression
  • 20. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Improved performance for data lakes As customers store larger and larger datasets in Amazon S3, Amazon S3 Select offers up to a 400% performance improvement
  • 21. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 Select enhancements NEW! Now supports: CSV, JSON, JSON arrays, and Parquet formats GZIP, BZIP2, and Snappy compression Integrated with Spark, Hive, and Presto on Amazon EMR
  • 22. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon FSx for Lustre For compute-intensive data processing use cases like HPC or machine learning Raw data stored in Amazon S3 is loaded to FSx for Lustre for processing Output of processing returned to Amazon S3 for retention
  • 23. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon FSx for Lustre performance Massively scalable performance 100+ GB/s throughput | Millions of IOPS | Consistent submillisecond latencies Parallel file system Supports hundreds of thousands of cores SSD-based
  • 24. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Choosing the right data formats There is no such thing as the “best” data format • All involve tradeoffs, depending on workload & tools • CSV, TSV, JSON are easy but not efficient • Compress & store or archive as raw input • Columnar compressed are generally preferred • Parquet or ORC • Smaller storage footprint = lower cost • More efficient scan & query • Row-oriented (AVRO) good for full data scans • Organize into partitions • Coalescing to larger partitions over time Key considerations are cost, performance, & support
  • 25. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Data prep is ~80% of data lake work Building training sets Cleaning and organizing data Collecting datasets Mining data for patterns Refining algorithms Other
  • 26. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Set up a catalog, ETL, and data prep with AWS Glue Serverless provisioning, configuration, and scaling to run your ETL jobs on Apache Spark Pay only for the resources used for jobs Crawl your data sources, identify data formats, and suggest schemas and transformations Automates the effort in building, maintaining and running ETL jobs
  • 27. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Security challenges with data lakes Data challenges • Controlling access to data • Data masking, row / column / cell level encryption, key management • Data loss / exfiltration • Loss of data integrity • Data provenance • Compliance requirements (GDPR and others) Management challenges • Central administration • Federated authentication, typically with Active Directory • Role-based access control (RBAC) • Centralized audit • End-to-end data protection (at- rest and in-transit)
  • 28. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T AWS helps you secure Compliance AWS Artifact Amazon Inspector AWS CloudHSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS WAF Amazon Macie Amazon VPC Encryption AWS Certification Manager AWS Key Management Service Encryption at rest Encryption in transit Bring your own keys, HSM support Identity AWS IAM AWS SSO Amazon Cloud Directory AWS Directory Service AWS Organizations Customers need multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
  • 29. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Data lake security • Data storage • Metadata
  • 30. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Control access to data Configure Amazon S3 permissions • Implement your access control matrix using IAM policies • Use S3 bucket policies for easy cross-account data sharing • Limit role-based access from an Amazon EMR cluster’s Amazon Elastic Compute Cloud (Amazon EC2) instance profile • Authorize access from other tools such as Amazon Redshift using IAM roles IAM principals Amazon EMR Amazon Redshift
  • 31. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Block public access to Amazon S3 Amazon S3 provides four settings • BlockPublicAcls – Rejects new public object or bucket ACLs • IgnorePublicAcls – Ignores existing public object or bucket ACLs • BlockPublicPolicy – Rejects new public bucket access policy • RestrictPublicBuckets – Restricts access to only AWS services and authorized users within the bucket owner's account But, what is “public”? • Public object (or bucket) ACL → Grants permissions to members of the predefined AllUsers or AuthenticatedUsers groups (grantees) • Public bucket policy → Doesn’t grant permissions to only fixed values in Principal and Condition elements
  • 32. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Amazon S3 Object Lock NEW! Immutable Amazon S3 objects • Write once read many (WORM) protections for Amazon S3 objects • Object or bucket control of WORM & retention attributes Retention managementcontrols • Define retention periods in your app or with bucket-level defaults • objects locked for the duration of the retention period • Support for legal hold scenarios Data protection and compliance • Assessedfor use in SEC 17a-4, CFTC, and FINRA environments • Extra protection against accidental or malicious delete
  • 33. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Metadata security AWS Glue Data Catalog • Apache Hive metastorecompatible • Track data evolution using schema versioning • Integrates with Hive, Spark, Presto, Amazon Athena and Amazon Redshift Spectrum • Use crawlers classify your data in one central list that is searchable
  • 34. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Metadata security Key learnings • Create and maintain centralized data catalog • Enable cross account access • Use IAM policies to control catalog access—similar to Amazon S3 bucket policies • Encrypt metadata in AWS Glue Data Catalog
  • 35. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T AWS Glue Data Catalog resource policies • Fine-grained access control to Data Catalog using IAM policies • Restrict what they can view and query
  • 36. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configureand enforce security and compliance policies 4 Make data available for analytics 5
  • 37. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Building data lakes can still take months
  • 38. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 39. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T How it works
  • 40. © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.S UM M I T Three simple steps to an AWS data lake Remove data silos Aggregate data Better agility More data = > Insights Know what you have Better datamanagement Quicker time to results Higher quality data Extract value from data Analyze & report on data Apply machine learning Visualize & consume results Amazon ingest & storage Amazon S3, Amazon S3 Glacier, AWS Sync, AWS Storage Gateway, AWS Snow Family, Amazon Kinesis AWS Glue Crawl, discover & catalogdata ETL data Amazon analytics & ML Amazon Athena, EMR, Amazon Redshift,Amazon SageMaker, Amazon Rekognition, Amazon EC2 + FSx for Lustre Collect & centralize Catalog & transform Analytics & insights
  • 41. Thank you! S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved. John Mallory johmallo@amazon.com
  • 42. S UM M I T © 2019, Amazon Web Services, Inc. orits affiliates. All rights reserved.