SlideShare uma empresa Scribd logo
1 de 19
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Paul Macey
Specialist Solution Architect, Big Data & Analytics
AWS Public Sector
Accelerated Data Lakes
Deep Dive Webinar
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Organisational data challenges
Accelerated data lake
Architecture
Onboarding
Demonstration
Wrap up
Questions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Organisational data challenges
Silos Governance
?
ScalabilitySecurity
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
Security
Day 0
Data governance
& metadata
Data centralised
& scalable
SQL & BI
ready
Analytical &
Data Science
foundation
Repeatable &
extensible
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Available today @ GitHub
https://github.com/aws-samples/accelerated-data-lake
Includes
Data lake pipeline (CloudFormation)
Instructions
Data configuration, security and metadata templates
Delivery
Professional services
AWS partners
Accelerated Data Lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
Flywheel of success
Start Small
Establish a
Repeatable
Workflow
Deliver
benefits
Improve
and Iterate
Repeat
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
High Level Data Flow
Lambda Functions
• Validation
• Apply Security
• Attach Metadata
• Catalog object
• File movement
• Alerts
Time based or
Event Driven
ProcessInitiation
S3 buckets
• Staging
• Raw
• Curated
• Gold
• Data discovery
• Logs
Data Lake
Storage
Metadata
Data Catalog
Data Lake
Enabling Analytics and Insights
Big Data, Querying
ETL & ML
Database /
BI
Analytics & Insights
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
File Processing Pipeline - S3 lambda event example
File arrives in the S3
Staging bucket
A lambda function is
triggered when the
object is created
The lambda passes the
S3 event data payload to
an AWS Step Function
The step function moves
through a repeatable data
file onboarding process
Validate Data Add security
tags to S3 object
Add metadata
to S3 object
Add object metadata
to DynamoDB
Index metadata
into ElasticSearch
Move file from the
Staging bucket to
Raw bucket
Get the
file specification
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Onboarding
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
Data onboarding process
1) Create an new file specification entry in a DynamoDb table
The table in the data lake solution is called “Data Sources”
1) Create a folder structure within the S3 staging bucket for the new data type
This is just to keep everything in S3 organised but also optimised for later use 
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
File Specification Settings
File Settings S3 Object Tags
Simple Metadata
Extended Metadata
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
Example S3 storage structure
Source Env /
Schema
Description /
Table / View
CMX
API-Raw
DB_Prod_HR
Staff
Data Flow Data FlowValidated & Approved
dev-rawdev-staging dev-curated dev-gold dev-validation
dev-data
discovery
sandpit
dev-logging
Wireless
CMX
2019
01
API-Raw
2019
01
select count(userid) from CMX where year=2019
select userid
from CMX
inner join API-RAW
on CMX.userid=API-RAW.userid
where year=2019
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3 Object Tags and Metadata
Image1001.jpg
jpeg image data
S3 object tags
S3 object
data.csv
S3 metadata
Key Value
Classification Internal
PII False
Use Case Interaction Extracts
Team Analytics
Key Value
Policy facility_iinternal
MD5 ab3116cded134
Data Owner User Interaction Team
Data Source prod_int_extraction_dw
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demonstration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IoT Simulator
https://aws.amazon.com/solutions/iot-device-simulator/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demonstration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Wrap up
Security
Day 0
Data governance
& metadata
Data centralised
& scalable
SQL & BI
ready
Analytical &
Data Science
foundation
Repeatable &
Extensible
The accelerated data lake solution
Can enable your data
Support data security and data governance
Can grow and scale in harmony with your organisation
Can be granted access to AWS’s analytics, ML and AI ecosystem
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
References
Amazon S3 security
https://aws.amazon.com/s3/faqs/#Security
https://docs.aws.amazon.com/AmazonS3/latest/dev/DataDurability.html
AWS Accelerated Data Lake (Git)
https://github.com/aws-samples/accelerated-data-lake
AWS Accelerated Data Lake Blog (part 1 & 2)
https://aws.amazon.com/blogs/publicsector/from-data-silos-to-data-domains-bringing-common-data-together
https://aws.amazon.com/blogs/publicsector/securing-your-data-by-knowing-your-data
Our data lake story: How Woot.com built a serverless data lake on AWS
https://aws.amazon.com/blogs/big-data/our-data-lake-story-how-woot-com-built-a-serverless-data-lake-on-aws

Mais conteúdo relacionado

Mais procurados

Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Amazon Web Services
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...Amazon Web Services
 
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...Amazon Web Services
 
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaAmazon Web Services LATAM
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...Amazon Web Services
 
business data catalog - Sharepoint Portal Server 2007
business data catalog - Sharepoint Portal Server 2007business data catalog - Sharepoint Portal Server 2007
business data catalog - Sharepoint Portal Server 2007Kashif Akram
 
GPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyGPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyAmazon Web Services
 
MassMutual Goes Cloud-First with Hybrid Cloud on AWS
MassMutual Goes Cloud-Firstwith Hybrid Cloud on AWSMassMutual Goes Cloud-Firstwith Hybrid Cloud on AWS
MassMutual Goes Cloud-First with Hybrid Cloud on AWSTom Laszewski
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018Amazon Web Services
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Amazon Web Services
 
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksAmazon Web Services
 

Mais procurados (20)

Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...
Enable Programmatic and Federated Access to Amazon Athena (ANT380-R1) - AWS r...
 
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
 
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...
Securing SaaS/Web and Windows Apps in a Hybrid Cloud World (SEC314-S) - AWS r...
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
 
business data catalog - Sharepoint Portal Server 2007
business data catalog - Sharepoint Portal Server 2007business data catalog - Sharepoint Portal Server 2007
business data catalog - Sharepoint Portal Server 2007
 
GPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyGPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made Easy
 
Amazon Container Services
Amazon Container ServicesAmazon Container Services
Amazon Container Services
 
MassMutual Goes Cloud-First with Hybrid Cloud on AWS
MassMutual Goes Cloud-Firstwith Hybrid Cloud on AWSMassMutual Goes Cloud-Firstwith Hybrid Cloud on AWS
MassMutual Goes Cloud-First with Hybrid Cloud on AWS
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
 
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech Talks
 

Semelhante a Accelerated Data Lakes Webinar

Accelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAccelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAmazon Web Services
 
Immersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSImmersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSAmazon Web Services LATAM
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitAmazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Amazon Web Services
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitAmazon Web Services
 

Semelhante a Accelerated Data Lakes Webinar (20)

Accelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAccelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul Macey
 
Immersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWSImmersion Day - Como construir seu Data Lake em dias na AWS
Immersion Day - Como construir seu Data Lake em dias na AWS
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Accelerated Data Lakes Webinar

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Paul Macey Specialist Solution Architect, Big Data & Analytics AWS Public Sector Accelerated Data Lakes Deep Dive Webinar
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Organisational data challenges Accelerated data lake Architecture Onboarding Demonstration Wrap up Questions
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Organisational data challenges Silos Governance ? ScalabilitySecurity
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake Security Day 0 Data governance & metadata Data centralised & scalable SQL & BI ready Analytical & Data Science foundation Repeatable & extensible
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Available today @ GitHub https://github.com/aws-samples/accelerated-data-lake Includes Data lake pipeline (CloudFormation) Instructions Data configuration, security and metadata templates Delivery Professional services AWS partners Accelerated Data Lake
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake Flywheel of success Start Small Establish a Repeatable Workflow Deliver benefits Improve and Iterate Repeat
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake High Level Data Flow Lambda Functions • Validation • Apply Security • Attach Metadata • Catalog object • File movement • Alerts Time based or Event Driven ProcessInitiation S3 buckets • Staging • Raw • Curated • Gold • Data discovery • Logs Data Lake Storage Metadata Data Catalog Data Lake Enabling Analytics and Insights Big Data, Querying ETL & ML Database / BI Analytics & Insights
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake File Processing Pipeline - S3 lambda event example File arrives in the S3 Staging bucket A lambda function is triggered when the object is created The lambda passes the S3 event data payload to an AWS Step Function The step function moves through a repeatable data file onboarding process Validate Data Add security tags to S3 object Add metadata to S3 object Add object metadata to DynamoDB Index metadata into ElasticSearch Move file from the Staging bucket to Raw bucket Get the file specification
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Onboarding
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake Data onboarding process 1) Create an new file specification entry in a DynamoDb table The table in the data lake solution is called “Data Sources” 1) Create a folder structure within the S3 staging bucket for the new data type This is just to keep everything in S3 organised but also optimised for later use 
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. File Specification Settings File Settings S3 Object Tags Simple Metadata Extended Metadata
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake Example S3 storage structure Source Env / Schema Description / Table / View CMX API-Raw DB_Prod_HR Staff Data Flow Data FlowValidated & Approved dev-rawdev-staging dev-curated dev-gold dev-validation dev-data discovery sandpit dev-logging Wireless CMX 2019 01 API-Raw 2019 01 select count(userid) from CMX where year=2019 select userid from CMX inner join API-RAW on CMX.userid=API-RAW.userid where year=2019
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S3 Object Tags and Metadata Image1001.jpg jpeg image data S3 object tags S3 object data.csv S3 metadata Key Value Classification Internal PII False Use Case Interaction Extracts Team Analytics Key Value Policy facility_iinternal MD5 ab3116cded134 Data Owner User Interaction Team Data Source prod_int_extraction_dw
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demonstration
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IoT Simulator https://aws.amazon.com/solutions/iot-device-simulator/
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demonstration
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Wrap up Security Day 0 Data governance & metadata Data centralised & scalable SQL & BI ready Analytical & Data Science foundation Repeatable & Extensible The accelerated data lake solution Can enable your data Support data security and data governance Can grow and scale in harmony with your organisation Can be granted access to AWS’s analytics, ML and AI ecosystem
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. References Amazon S3 security https://aws.amazon.com/s3/faqs/#Security https://docs.aws.amazon.com/AmazonS3/latest/dev/DataDurability.html AWS Accelerated Data Lake (Git) https://github.com/aws-samples/accelerated-data-lake AWS Accelerated Data Lake Blog (part 1 & 2) https://aws.amazon.com/blogs/publicsector/from-data-silos-to-data-domains-bringing-common-data-together https://aws.amazon.com/blogs/publicsector/securing-your-data-by-knowing-your-data Our data lake story: How Woot.com built a serverless data lake on AWS https://aws.amazon.com/blogs/big-data/our-data-lake-story-how-woot-com-built-a-serverless-data-lake-on-aws