SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How to go from zero to data
lakes in days
Mehul A. Shah
GM, AWS Glue and AWS Lake
Formation
Amazon Web Services
A D B 2 0 2
Srinivas Ravilisetty
IT Analytics Lead
Alcon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Trends driving the revolution
What are data lakes?
What’s hard today?
AWS Lake Formation makes data lakes easy. Demo!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Decision making used to…
OLTP
ERP CRM
LOB
Enterprise Data Warehouse
Business Intelligence
… revolve around the Enterprise
Data Warehouse (in the 90s – 00s)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data
every 5 years
There is more data than
people think
15
years
live for
Data platforms need to
1,000x
scale
>10x
grows
Data no longer fits
Data is more diverse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
There are more people
accessing data
that want to analyze it in
different ways
And there are more rules
around data use
Data Scientists
Analysts
Business Users
Applications
machine learning SQL analytics
scientific
real-time,
streaming
Broader workloads
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
The cloud is a game changer
Amazon S3: ubiquitous storage allows
you to centralize datasets
Want a single locus of control
Many scalable analytics engines
available on-demand,
pay-as-you-go
Amazon S3
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises,
Batch
Real-time,
Streaming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Trends driving the revolution
What are data lakes?
What’s hard today?
AWS Lake Formation makes data lakes easy. Demo!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data lake: The new information hub
A centralized repository that enables
you to secure, discover, share, and analyze
structured and unstructured data at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
More data lakes & analytics on AWS than anywhere else
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Trends driving the revolution
What are data lakes?
What’s hard today?
AWS Lake Formation makes data lakes easy. Demo!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building data lakes can still take months
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Ingestion & cleaning Security Analytics & ML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting datasets
Mining data for patterns
Refining algorithms
Other
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Sample of steps required Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
Datasets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Trends driving the revolution
What are data lakes?
What’s hard today?
AWS Lake Formation makes data lakes easy. Demo!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Lake Formation: Build a secure data lake in days
Amazon S3
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
Simplifies data preparation:
ingest and cleaning
Secure data and efficiently
multiplex across engines
Self-serve: discover, share,
and collaborate
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo: Build a data lake w/AWS CloudTrail data
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Ingestion & cleaning Security Analytics & ML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 1: Register an Amazon S3 path as data lake
location
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 2 & 3: Load data w/ blueprint
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Blueprints: Easily load data into your data lake
Logs
DBs
Prebuilt templates for different
data source types
Utilizes AWS Glue workflows
ETL jobs and crawlers automate
data layout, formats, and
partitions
Both one-time and incremental data loads
Populates common Data Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Blueprints create AWS Glue workflows
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Lake Formation is built on AWS Glue
Blueprints
Glue ETL Jobs
Workflow
Glue Crawlers
Glue Data Catalog
Connections,
Databases, Tables
Monitoring
Security, search, collaboration
AWS Glue
AWS Lake Formation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
OrchestrationData Catalog Serverless Engine
Automatic crawling
Apache Hive Metastore compatible
Integrated with AWS analytic services
Discover
Flexible workflows
Monitoring and alerting
External integrations
Deploy
Apache Spark
Python shell
Interactive and batch jobs
Develop
AWS Glue components
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Organize in Data Catalog: Databases and tables
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Search and collaborate across multiple users
Text-based search across all
metadata
Add attributes like data
owners, stewards, and other
as table properties
Add data sensitivity level,
column definitions, and
others as column properties
Text-based
search and
filtering
Query data in
Amazon Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
“FindMatches” ML Transform
Record matching
Findingtherelationshipsbetweenmultipledatasets,
evenwhenthosedatasetsdonotshareanidentifier
(orwhentheiridentifierisunreliable)
Deduplication
Transformingadatasetthathasmultiplerowsreferring
tothesameactualthingintoadatasetwherenotwo
rowsrefertothesameactualthing
ML FindMatches
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Easily de-duplicate your data with ML transforms
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Fuzzy de-duplication – Under the hood
State-of-the-art
Cost/accuracy
slider
Positive and negative
examples
Precision/recall
slider
Tune via:
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Fuzzy de-duplication – Innovations
400M+
7.5B+
2.5
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Ingestion & cleaning
Demo: Build a data lake w/AWS CloudTrail data
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Security Analytics & ML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Lake Formation: Centralized authorization
Data
Catalog
Access
Control
Lake Formation
Amazon S3
Data Lake Storage
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Security permissions in Lake Formation
Control data access with simple grant
and revoke permissions
Specify permissions on tables and
columns rather than on buckets and
objects
Easily view permissions granted to a
particular user
Audit all data access in one place User 1
User 2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Security – Deep dive
User
IAM users, roles, and
Active Directory Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Security – Take away
No intermediary in data path
End analytics services filter data
Lake Formation securely logs
all accesses for auditing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 4: Grant permissions to users
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Ingestion & cleaning
Demo: Build a data lake w/AWS CloudTrail data
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
Security Analytics & ML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 5: Run query in Amazon Athena
Athena integration
guarantees user sees only the
tables & columns they have
access to
All access is logged and
auditable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 5: Run query in Amazon Redshift
Redshift integration
guarantees user sees only
the tables & columns they
have access to
All access is logged and
auditable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo Step 6: Audit and monitor in real time
See detailed alerts in the console
Download audit logs for further
analytics
Data ingest and catalog
notifications also published to
Amazon CloudWatch events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Upgrading AWS Glue Data Catalog to Lake Formation
Lake Formation & AWS Glue use the same Data Catalog
Existing Glue crawlers, jobs, triggers and workflows will not change
Existing access to Glue resources will still be governed by IAM policies
Explicitly upgrade each data location, database & table when ready
Upgrading enables you to take advantage of new permissions model
Data
Catalog
ETL Jobs
Access
Control
Crawlers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Lake Formation Pricing
No additional charges – Only pay for
the underlying services used
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Conclusion
AWS Lake Formation makes setting up, securing, and
using data lakes simple
Data lakes are the evolution of warehousing
Lots more to come:
Simplify permissions: tag-based access control
provenance and compliance: lineage
ML-based controls: e.g., PII detection
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alcon
Srinivas Ravilisetty
IT Analytics Lead
Alcon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alcon Analytics Team
Alcon Analytics Platforms
Tableau, Qlik, Alteryx, etc.
Alcon Data Lake Platform
Data Lake Platform on AWS
Alcon Data Management Platform
EBX5 Platform, Reference Data Management, Data Governance Teams, etc.
Analytics Products
Field Force Reporting, Sales Reporting, etc.
Advanced Analytics Products & Tools
R Studio, DataRobot, Forecasting, etc.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Problem statement & approach
Significant effort to get to trusted data
Limited data ownership & stewardship
Limited agility & extensibility
Global Source of trusted data for analytics
Business ownership and management of data
Rapid, continuous delivery of data and insights
Emerging and advanced analytic opportunities
Self-service analytics framework
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data lake journey
Data lakes on Hadoop (6)
Data warehouses (14)
Data marts (30+)
100+ servers
Multiple security models
Data Lake on Amazon S3
Storage and compute separated
Leverage AWS Services
Consistent and simplified
security model
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alcon data lake – Built using AWS
Simple user experience – Browse data assets, request access, perform
additional data preparation if needed, and analyze using service and
tool of choice to draw insights
Enables our users to analyze data more and prepare less
Separate storage and compute
• Storage using Amazon S3
• Compute using AWS Glue, Athena, Redshift, AWS Lambda, and
Amazon EMR
Built and released using Agile development methods on AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Third party &
Distributor
Files
CLOUD DATA
VISUALIZATION
INGEST STORE TRANSFORM PROVISION
Data
Privacy Tool
DATA MANAGEMENT
ANALYTICS
ForeSight - Alcon DataLake
ORA AnalyzOR
Data
Manufacturing
Data
On-premise FilesManaged Markets
Data
On-premise
Data Sources
ON-PREMISE
DATA
ERP
ANALYSE
INTEGRATION
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Current security model – Complex & limiting
Current security management is complex due to security
enforced through S3 bucket polices
Complexity in providing access to data assets through
visualization and analytics tools
Limitation on size of policy files does not provide a scalable
solution
Column and row-level security achieved though complex policy
files or data replication, which is not sustainable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why Lake Formation? Security
AWS Lake Formation simplifies our security model and data
cataloging
Reduces our effort in implementing:
• Reduced security model by 3x
• Reduced security management by 2x
• Eliminates custom code for security management
Reduces data redundancy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why Lake Formation? Democratized Access
Expands our AWS Glue catalog metadata with business metadata
• Enables user-friendly search of data using
business metadata
Migration is simple and has minimal impact on users
• Existing tools work seamlessly
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mehul A. Shah
glue-pm@amazon.com
Srinivas Ravilsetty
srinivas.ravilisetty@alcon.com

Mais conteúdo relacionado

Mais procurados

컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017
컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017
컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017Amazon Web Services Korea
 
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
AWS Certified Developer Associate - Notes
AWS Certified Developer Associate - NotesAWS Certified Developer Associate - Notes
AWS Certified Developer Associate - NotesAnmol Dash
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...Simplilearn
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Scale - Enterprise Network Architectures on AWS
Scale - Enterprise Network Architectures on AWSScale - Enterprise Network Architectures on AWS
Scale - Enterprise Network Architectures on AWSAmazon Web Services
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & LoggingJason Poley
 
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저Amazon Web Services Korea
 
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나Amazon Web Services Korea
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery ServiceAmazon Web Services
 

Mais procurados (20)

컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017
컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017
컴플라이언스를 위한 고급 AWS 보안 구성 방법-AWS Summit Seoul 2017
 
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
AWS Certified Developer Associate - Notes
AWS Certified Developer Associate - NotesAWS Certified Developer Associate - Notes
AWS Certified Developer Associate - Notes
 
Microservices and Amazon ECS
Microservices and Amazon ECSMicroservices and Amazon ECS
Microservices and Amazon ECS
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Scale - Enterprise Network Architectures on AWS
Scale - Enterprise Network Architectures on AWSScale - Enterprise Network Architectures on AWS
Scale - Enterprise Network Architectures on AWS
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & Logging
 
Boot camp - Migration to AWS
Boot camp - Migration to AWSBoot camp - Migration to AWS
Boot camp - Migration to AWS
 
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저
AWS Builders Online Series | AWS와 함께하는 클라우드 컴퓨팅 - 강철, AWS 어카운트 매니저
 
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나
AWS 클라우드 서비스 소개 및 사례 (방희란) - AWS 101 세미나
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Amazon ElastiCache and Redis
Amazon ElastiCache and RedisAmazon ElastiCache and Redis
Amazon ElastiCache and Redis
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery Service
 

Semelhante a How to go from zero to data lakes in days - ADB202 - New York AWS Summit

AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
Immersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSImmersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSAmazon Web Services LATAM
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataAmazon Web Services
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaAmazon Web Services LATAM
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Amazon Web Services
 

Semelhante a How to go from zero to data lakes in days - ADB202 - New York AWS Summit (20)

AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Immersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSImmersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With Data
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How to go from zero to data lakes in days - ADB202 - New York AWS Summit

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How to go from zero to data lakes in days Mehul A. Shah GM, AWS Glue and AWS Lake Formation Amazon Web Services A D B 2 0 2 Srinivas Ravilisetty IT Analytics Lead Alcon
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Trends driving the revolution What are data lakes? What’s hard today? AWS Lake Formation makes data lakes easy. Demo!
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Decision making used to… OLTP ERP CRM LOB Enterprise Data Warehouse Business Intelligence … revolve around the Enterprise Data Warehouse (in the 90s – 00s)
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows Data no longer fits Data is more diverse
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T There are more people accessing data that want to analyze it in different ways And there are more rules around data use Data Scientists Analysts Business Users Applications machine learning SQL analytics scientific real-time, streaming Broader workloads
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T The cloud is a game changer Amazon S3: ubiquitous storage allows you to centralize datasets Want a single locus of control Many scalable analytics engines available on-demand, pay-as-you-go Amazon S3 AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises, Batch Real-time, Streaming
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Trends driving the revolution What are data lakes? What’s hard today? AWS Lake Formation makes data lakes easy. Demo!
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lake: The new information hub A centralized repository that enables you to secure, discover, share, and analyze structured and unstructured data at any scale
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T More data lakes & analytics on AWS than anywhere else
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Trends driving the revolution What are data lakes? What’s hard today? AWS Lake Formation makes data lakes easy. Demo!
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building data lakes can still take months
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Ingestion & cleaning Security Analytics & ML
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting datasets Mining data for patterns Refining algorithms Other
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: Datasets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Trends driving the revolution What are data lakes? What’s hard today? AWS Lake Formation makes data lakes easy. Demo!
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Lake Formation: Build a secure data lake in days Amazon S3 Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep Simplifies data preparation: ingest and cleaning Secure data and efficiently multiplex across engines Self-serve: discover, share, and collaborate
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo: Build a data lake w/AWS CloudTrail data Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Ingestion & cleaning Security Analytics & ML
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 1: Register an Amazon S3 path as data lake location
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 2 & 3: Load data w/ blueprint
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Blueprints: Easily load data into your data lake Logs DBs Prebuilt templates for different data source types Utilizes AWS Glue workflows ETL jobs and crawlers automate data layout, formats, and partitions Both one-time and incremental data loads Populates common Data Catalog
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Blueprints create AWS Glue workflows
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Lake Formation is built on AWS Glue Blueprints Glue ETL Jobs Workflow Glue Crawlers Glue Data Catalog Connections, Databases, Tables Monitoring Security, search, collaboration AWS Glue AWS Lake Formation
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T OrchestrationData Catalog Serverless Engine Automatic crawling Apache Hive Metastore compatible Integrated with AWS analytic services Discover Flexible workflows Monitoring and alerting External integrations Deploy Apache Spark Python shell Interactive and batch jobs Develop AWS Glue components
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Organize in Data Catalog: Databases and tables
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Search and collaborate across multiple users Text-based search across all metadata Add attributes like data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T “FindMatches” ML Transform Record matching Findingtherelationshipsbetweenmultipledatasets, evenwhenthosedatasetsdonotshareanidentifier (orwhentheiridentifierisunreliable) Deduplication Transformingadatasetthathasmultiplerowsreferring tothesameactualthingintoadatasetwherenotwo rowsrefertothesameactualthing ML FindMatches
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily de-duplicate your data with ML transforms
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Fuzzy de-duplication – Under the hood State-of-the-art Cost/accuracy slider Positive and negative examples Precision/recall slider Tune via:
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Fuzzy de-duplication – Innovations 400M+ 7.5B+ 2.5
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ingestion & cleaning Demo: Build a data lake w/AWS CloudTrail data Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Security Analytics & ML
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Lake Formation: Centralized authorization Data Catalog Access Control Lake Formation Amazon S3 Data Lake Storage
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view permissions granted to a particular user Audit all data access in one place User 1 User 2
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security – Deep dive User IAM users, roles, and Active Directory Amazon S3
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security – Take away No intermediary in data path End analytics services filter data Lake Formation securely logs all accesses for auditing
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 4: Grant permissions to users
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ingestion & cleaning Demo: Build a data lake w/AWS CloudTrail data Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Security Analytics & ML
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 5: Run query in Amazon Athena Athena integration guarantees user sees only the tables & columns they have access to All access is logged and auditable
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 5: Run query in Amazon Redshift Redshift integration guarantees user sees only the tables & columns they have access to All access is logged and auditable
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo Step 6: Audit and monitor in real time See detailed alerts in the console Download audit logs for further analytics Data ingest and catalog notifications also published to Amazon CloudWatch events
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Upgrading AWS Glue Data Catalog to Lake Formation Lake Formation & AWS Glue use the same Data Catalog Existing Glue crawlers, jobs, triggers and workflows will not change Existing access to Glue resources will still be governed by IAM policies Explicitly upgrade each data location, database & table when ready Upgrading enables you to take advantage of new permissions model Data Catalog ETL Jobs Access Control Crawlers
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Conclusion AWS Lake Formation makes setting up, securing, and using data lakes simple Data lakes are the evolution of warehousing Lots more to come: Simplify permissions: tag-based access control provenance and compliance: lineage ML-based controls: e.g., PII detection
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alcon Srinivas Ravilisetty IT Analytics Lead Alcon
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alcon Analytics Team Alcon Analytics Platforms Tableau, Qlik, Alteryx, etc. Alcon Data Lake Platform Data Lake Platform on AWS Alcon Data Management Platform EBX5 Platform, Reference Data Management, Data Governance Teams, etc. Analytics Products Field Force Reporting, Sales Reporting, etc. Advanced Analytics Products & Tools R Studio, DataRobot, Forecasting, etc.
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Problem statement & approach Significant effort to get to trusted data Limited data ownership & stewardship Limited agility & extensibility Global Source of trusted data for analytics Business ownership and management of data Rapid, continuous delivery of data and insights Emerging and advanced analytic opportunities Self-service analytics framework
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lake journey Data lakes on Hadoop (6) Data warehouses (14) Data marts (30+) 100+ servers Multiple security models Data Lake on Amazon S3 Storage and compute separated Leverage AWS Services Consistent and simplified security model
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alcon data lake – Built using AWS Simple user experience – Browse data assets, request access, perform additional data preparation if needed, and analyze using service and tool of choice to draw insights Enables our users to analyze data more and prepare less Separate storage and compute • Storage using Amazon S3 • Compute using AWS Glue, Athena, Redshift, AWS Lambda, and Amazon EMR Built and released using Agile development methods on AWS
  • 51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Third party & Distributor Files CLOUD DATA VISUALIZATION INGEST STORE TRANSFORM PROVISION Data Privacy Tool DATA MANAGEMENT ANALYTICS ForeSight - Alcon DataLake ORA AnalyzOR Data Manufacturing Data On-premise FilesManaged Markets Data On-premise Data Sources ON-PREMISE DATA ERP ANALYSE INTEGRATION
  • 52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Current security model – Complex & limiting Current security management is complex due to security enforced through S3 bucket polices Complexity in providing access to data assets through visualization and analytics tools Limitation on size of policy files does not provide a scalable solution Column and row-level security achieved though complex policy files or data replication, which is not sustainable
  • 53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why Lake Formation? Security AWS Lake Formation simplifies our security model and data cataloging Reduces our effort in implementing: • Reduced security model by 3x • Reduced security management by 2x • Eliminates custom code for security management Reduces data redundancy
  • 54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why Lake Formation? Democratized Access Expands our AWS Glue catalog metadata with business metadata • Enables user-friendly search of data using business metadata Migration is simple and has minimal impact on users • Existing tools work seamlessly
  • 55. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mehul A. Shah glue-pm@amazon.com Srinivas Ravilsetty srinivas.ravilisetty@alcon.com