SlideShare uma empresa Scribd logo
1 de 45
P U B L I C S E C T O R
S U M M I T
B OGOTA
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Building Data Lakes &Analytics onAWS
Mauro Assis
Researcher
Earth System Science Center
INPE
Angelo Carvalho
Specialist Solutions Architect - Analytics
Public Sector for Latin America, Canada and Caribbean
AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Organizations that successfully generate
business value from their data will outperform
their peers. An Aberdeen survey showed that
organizations that implemented a data lake
outperform similar companies by 9% in
organic revenue growth.*
24%
15%
Leaders Followers
Organic revenue growth
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Most Important: DrivingValuefrom Data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Traditionally,AnalyticsUsed to Look LikeThis
OLTP ERP CRM LOB
Data warehouse
Business intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial CAPEX + $10K–$50K/TB/year
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes Extend theTraditionalApproach
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and non-relational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes fromAWS
Analytics
• Unmatched durability and availability at EB scale
• Best security, compliance, and audit capabilities
• Object-level controls for fine-grained access
• Fastest performance by retrieving subsets of data
• The greatest variety of ways to bring data in
• 2x as many integrations with partners
• Analyze with broadest set of analytics & machine
learning (ML) services
Machine
learning
Real-time dataOn-premises
Data Lake
on AWS
movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
Language Translation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Data Lakes,Analytics,and IoTPortfolio fromAWS
Broadest,deepestsetofanalyticservices
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes,Analytics,and IoTPortfolio fromAWS
Broadest,deepestsetofanalyticservices
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Job AuthoringData Catalog Job Execution
Apache Hive Metastore compatible
Integrated with AWS services
Automatic crawling
Discover
Auto-generates ETL code
Python and Apache Spark
Edit, debug, and share
Develop
Serverless execution
Flexible scheduling
Monitoring and alerting
Deploy
AWS Glue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lake on Amazon S3 with AWS Glue
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
Your data
AMAZON
QUICKSIGHT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Other Ways of Populating the Catalog
Call the AWS Glue CreateTable API
Create table manually
Run Hive DDL statement
Apache Hive
Metastore
AWS GLUE ETL AWS GLUE
DATA CATALOG
Import from Apache Hive Metastore
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
How Do IDriveValue?
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data in Amazon
S3 using standard SQL.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
FamiliarTechnologiesUnder theCovers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
(Eg. SELECT * FROM tableName)
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
(Eg. CREATE TABLE, ALTER TABLE,
MSCK REPAIR)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Exploring data with Amazon Athena
On-premises Data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Hadoop/SparkAnalytics
• Distributed processing
• Diverse analytics
• Batch/Script (Hive/Pig)
• Interactive (Spark, Presto)
• Real-time (Spark)
• Machine Learning (Spark)
• NoSQL (HBase)
• For many use cases
• Log and clickstream analysis
• Machine learning
• Real-time analytics
• Large-scale analytics
• Genomics
• ETL
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Hadoop/SparkAnalyticsonAWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop/Spark
Object Storage
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
EMR – Enterprise-grade Hadoop &Spark
DeploylatestreleasesinHadoopandSparkecosystems
• Nineteen open-source
projects: Apache Hadoop,
Spark, HBase, Presto, and
more
• Updated with the latest
open source frameworks
within 30 days of release
Hadoop
Ganglia
HBase
Hive&
Catalog
Hue
Mahout
Oozie
Phoenix
Pig
Presto
Spark
Tez
Zeppelin
Zookeeper
Flink
Livy
MXNet
Sqoop
Emr-4.0.0
July2015
2.6.0 1.0.0 0.10.0 0.14.0 1.4.1
Emr-4.7.0
June2016
2.7.2 3.7.2 1.2.1 1.0.0 3.7.1 0.12.0 4.2.0 4.7.0 0.14.0 .147 1.6.1 1.4.6 0.8.3 0.5.6 3.4.8
Emr-5.3.0
January2017
2.7.3 3.7.2
1.2.3
+
S3
2.1.1 3.11.0 0.12.2 4.3.0 4.7.0 0.16.0 0.157.1 2.1.0 1.4.6 0.8.4 0.6.2 3.4.9 1.1.4
Emr-5.11.0
December2017
2.7.3 3.7.2
1.3.1
+
S3
2.3.2 4.0.1 0.13.0 4.3.0 4.11.0 0.17.0 .187 2.2.1 1.4.6 0.8.4 0.7.3 3.4.10 1.3.2 0.4.0 0.12.0
EMR releases
AmazonS3 –Source ofTruth,MultipleClusters
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
HDFS
EC2 Instance Memory
Intermediates stored on
local disk or HDFSLocal
HDFS
EC2 Instance Memory
Intermediates stored on
local disk or HDFSLocal
Transient ETL Job
Source of Truth
HDFS
HDFS
HDFS
Local Intermediate HDFS/Storage
Local Intermediate HDFS/Storage
External Metadata Management
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
Transient ETL Job
Source of Truth
HDFS
Describes Data in S3
MySQL DB
instance
Customershaveoptions
Glue Data
Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Reprocess data with Amazon EMR (Spark)
On-premise data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
MachineLearning onYour DataLake
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Frameworks &
Infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
MobileCPU IoT (Greengrass)
Vision:
Amazon Rekognition Image
Amazon Rekognition Video
Speech:
Amazon Polly
Amazon Transcribe
Language:
Amazon Lex
Amazon Translate
Amazon Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
Application
Services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
EMR
Amazon
SageMaker
AWS
DeepLens
ML in the Hands of Every Developer
Amazon SageMaker
1 2 3 4
I I I I
Notebook Instances Algorithms ML Training Service ML Hosting Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Machine Learning with Amazon
SageMaker
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Mapping Amazon Biomass using
Amazon Analytics
www.ccst.inpe.br
Mauro Assis
assismauro@hotmail.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
EARTH SYSTEM SCIENCE
CENTERSTRATEGIC GOALS
Development and improvement of earth system models,
monitoring networks and socio-political analyzes, aiming at the
construction and analysis of scenarios of environmental
changes and climate projections.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The question:
How much does the Amazon forest weigh?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The previous question:
Why map Amazon forest biomass?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Biomass map process
• 68 million pixels (250 x 250m)
• 4 million km² area
• ~1000 LiDAR flights data
• Each flight: 6.5 billion of data recs
• 10 bands of satellite data for each pixel
• 4 to 6 h/map generation
• 16 CPU/32 gbyte RAM/21 Tb HD
• Random Forest algorithm
• Python H2O
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
LiDAR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
LiDAR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Uncertainty map
• Propagate error from field to random forest extrapolation
• 1000 biomass values normally distributed for each pixel
• A thousand maps to generate…
• … How???
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The answer: Analytics processing on AWS
• AWS engages a partner (DataRain)
• Two PoCs
• Four EC2 instances Linux 64 cores/256 Gbytes each
• Anaconda/H2O Python environment
• Script with lots of parallel processing
• Divided Amazon area into16 segments
• Two operators to run everything in 40 hours
• We downloaded the 1000 maps and sumarize at INPE
• It tooks about 2 days to generate the final map
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Architecture
AWS CloudINPE
Researcher
Desktops Amazon
S3
Amazon EC2 Amazon EBS
Internet
Amazon EC2 Amazon EBS
Amazon EC2 Amazon EBS
Amazon EC2 Amazon EBS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Main benefits
• Uncertainty map itself
• DataRain (AWS partner) support
• First time we use cloud services at INPE
• Map obtained before the end of the project
• ROI
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Return of the investment
• LiDAR Fligts costs $2,000 each
• 1000 flights => $2M
• To update the model: 100~150 flights
• 150 flights => $300k
• Cost of map generation: $10,000
• Money saved in the next map update:
$2M – $300k – $10k = $1.69 M
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Agilityand InnovationAreKey
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Mauro Assis
assismauro@hotmail.com
Angelo Carvalho
carvaa@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T

Mais conteúdo relacionado

Mais procurados

Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentAmazon Web Services
 
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218Amazon Web Services
 
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...Amazon Web Services
 
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...Amazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...Amazon Web Services
 

Mais procurados (20)

Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
 
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218
ABD218_How Euroleague Basketball Uses IoT Analytics to Engage Fans- ABD218
 
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...
How to Confidently Unleash Data to Meet the Needs of Your Entire Organization...
 
Graph and Amazon Neptune
Graph and Amazon NeptuneGraph and Amazon Neptune
Graph and Amazon Neptune
 
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...
Data preparation and transformation - Spin your straw into gold - Tel Aviv Su...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 

Semelhante a Building Data Lakes & Analytics on AWS

Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Dayjavier ramirez
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataAmazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Multi-Source, Multi-Speed Analytics on AWS Webinar
Multi-Source, Multi-Speed Analytics on AWS WebinarMulti-Source, Multi-Speed Analytics on AWS Webinar
Multi-Source, Multi-Speed Analytics on AWS WebinarAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

Semelhante a Building Data Lakes & Analytics on AWS (20)

Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With Data
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Multi-Source, Multi-Speed Analytics on AWS Webinar
Multi-Source, Multi-Speed Analytics on AWS WebinarMulti-Source, Multi-Speed Analytics on AWS Webinar
Multi-Source, Multi-Speed Analytics on AWS Webinar
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

Mais de AWS Summits

AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summits
 
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summits
 
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summits
 
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summits
 
AWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summits
 
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summits
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summits
 
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summits
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summits
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summits
 
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summits
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summits
 
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summits
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summits
 
AWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summits
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summits
 
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summits
 
AWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summits
 
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summits
 

Mais de AWS Summits (20)

AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
 
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
 
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
 
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
 
AWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to Exit
 
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
 
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
 
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
AWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business Value
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
 
AWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container Security
 
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
 

Building Data Lakes & Analytics on AWS

  • 1. P U B L I C S E C T O R S U M M I T B OGOTA
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Building Data Lakes &Analytics onAWS Mauro Assis Researcher Earth System Science Center INPE Angelo Carvalho Specialist Solutions Architect - Analytics Public Sector for Latin America, Canada and Caribbean AWS
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Organizations that successfully generate business value from their data will outperform their peers. An Aberdeen survey showed that organizations that implemented a data lake outperform similar companies by 9% in organic revenue growth.* 24% 15% Leaders Followers Organic revenue growth *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Most Important: DrivingValuefrom Data
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Traditionally,AnalyticsUsed to Look LikeThis OLTP ERP CRM LOB Data warehouse Business intelligence • Relational data • TBs–PBs scale • Schema defined prior to data load • Operational reporting and ad hoc • Large initial CAPEX + $10K–$50K/TB/year
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data Lakes Extend theTraditionalApproach Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and non-relational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data Lakes fromAWS Analytics • Unmatched durability and availability at EB scale • Best security, compliance, and audit capabilities • Object-level controls for fine-grained access • Fastest performance by retrieving subsets of data • The greatest variety of ways to bring data in • 2x as many integrations with partners • Analyze with broadest set of analytics & machine learning (ML) services Machine learning Real-time dataOn-premises Data Lake on AWS movementdata movement
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Managed ML Service Deep Learning AMIs Video and Image Recognition Conversational Interfaces Deep-Learning Video Camera Natural Language Processing Language Translation Speech Recognition Text-to-Speech Interactive Analysis Hadoop & Spark Data Warehousing Full-text search Real-time analytics Dashboards & Visualizations Dedicated Network connection Secure appliances Ruggedized Shipping Container Database migration Connect Devices to AWS Real-time Data Streams Real-time Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement Data Lakes,Analytics,and IoTPortfolio fromAWS Broadest,deepestsetofanalyticservices
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data Lakes,Analytics,and IoTPortfolio fromAWS Broadest,deepestsetofanalyticservices Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Job AuthoringData Catalog Job Execution Apache Hive Metastore compatible Integrated with AWS services Automatic crawling Discover Auto-generates ETL code Python and Apache Spark Edit, debug, and share Develop Serverless execution Flexible scheduling Monitoring and alerting Deploy AWS Glue
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data Lake on Amazon S3 with AWS Glue On-premises data Web app data Amazon RDS Other databases Streaming data Your data AMAZON QUICKSIGHT
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Other Ways of Populating the Catalog Call the AWS Glue CreateTable API Create table manually Run Hive DDL statement Apache Hive Metastore AWS GLUE ETL AWS GLUE DATA CATALOG Import from Apache Hive Metastore
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T How Do IDriveValue? Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T FamiliarTechnologiesUnder theCovers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions (Eg. SELECT * FROM tableName) Used for DDL functionality Complex data types Multitude of formats Supports data partitioning (Eg. CREATE TABLE, ALTER TABLE, MSCK REPAIR)
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Exploring data with Amazon Athena On-premises Data Web app data Amazon RDS Other Databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Hadoop/SparkAnalytics • Distributed processing • Diverse analytics • Batch/Script (Hive/Pig) • Interactive (Spark, Presto) • Real-time (Spark) • Machine Learning (Spark) • NoSQL (HBase) • For many use cases • Log and clickstream analysis • Machine learning • Real-time analytics • Large-scale analytics • Genomics • ETL YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Hadoop/SparkAnalyticsonAWS YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS Amazon S3 Amazon EMR Managed Hadoop/Spark Object Storage
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T EMR – Enterprise-grade Hadoop &Spark DeploylatestreleasesinHadoopandSparkecosystems • Nineteen open-source projects: Apache Hadoop, Spark, HBase, Presto, and more • Updated with the latest open source frameworks within 30 days of release Hadoop Ganglia HBase Hive& Catalog Hue Mahout Oozie Phoenix Pig Presto Spark Tez Zeppelin Zookeeper Flink Livy MXNet Sqoop Emr-4.0.0 July2015 2.6.0 1.0.0 0.10.0 0.14.0 1.4.1 Emr-4.7.0 June2016 2.7.2 3.7.2 1.2.1 1.0.0 3.7.1 0.12.0 4.2.0 4.7.0 0.14.0 .147 1.6.1 1.4.6 0.8.3 0.5.6 3.4.8 Emr-5.3.0 January2017 2.7.3 3.7.2 1.2.3 + S3 2.1.1 3.11.0 0.12.2 4.3.0 4.7.0 0.16.0 0.157.1 2.1.0 1.4.6 0.8.4 0.6.2 3.4.9 1.1.4 Emr-5.11.0 December2017 2.7.3 3.7.2 1.3.1 + S3 2.3.2 4.0.1 0.13.0 4.3.0 4.11.0 0.17.0 .187 2.2.1 1.4.6 0.8.4 0.7.3 3.4.10 1.3.2 0.4.0 0.12.0 EMR releases
  • 19. AmazonS3 –Source ofTruth,MultipleClusters Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS HDFS EC2 Instance Memory Intermediates stored on local disk or HDFSLocal HDFS EC2 Instance Memory Intermediates stored on local disk or HDFSLocal Transient ETL Job Source of Truth HDFS HDFS HDFS Local Intermediate HDFS/Storage Local Intermediate HDFS/Storage
  • 20. External Metadata Management Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS Transient ETL Job Source of Truth HDFS Describes Data in S3 MySQL DB instance Customershaveoptions Glue Data Catalog
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Reprocess data with Amazon EMR (Spark) On-premise data Web app data Amazon RDS Other Databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T MachineLearning onYour DataLake Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 23. Frameworks & Infrastructure AWS Deep Learning AMI GPU (P3 Instances) MobileCPU IoT (Greengrass) Vision: Amazon Rekognition Image Amazon Rekognition Video Speech: Amazon Polly Amazon Transcribe Language: Amazon Lex Amazon Translate Amazon Comprehend Apache MXNet PyTorch Cognitive Toolkit Keras Caffe2 & Caffe TensorFlow Gluon Application Services Platform Services Amazon Machine Learning Mechanical Turk Spark & EMR Amazon SageMaker AWS DeepLens ML in the Hands of Every Developer
  • 24. Amazon SageMaker 1 2 3 4 I I I I Notebook Instances Algorithms ML Training Service ML Hosting Service
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Machine Learning with Amazon SageMaker On-premises data Web app data Amazon RDS Other databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Mapping Amazon Biomass using Amazon Analytics www.ccst.inpe.br Mauro Assis assismauro@hotmail.com
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T EARTH SYSTEM SCIENCE CENTERSTRATEGIC GOALS Development and improvement of earth system models, monitoring networks and socio-political analyzes, aiming at the construction and analysis of scenarios of environmental changes and climate projections.
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The question: How much does the Amazon forest weigh?
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The previous question: Why map Amazon forest biomass?
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Biomass map process • 68 million pixels (250 x 250m) • 4 million km² area • ~1000 LiDAR flights data • Each flight: 6.5 billion of data recs • 10 bands of satellite data for each pixel • 4 to 6 h/map generation • 16 CPU/32 gbyte RAM/21 Tb HD • Random Forest algorithm • Python H2O
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T LiDAR
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T LiDAR
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Uncertainty map • Propagate error from field to random forest extrapolation • 1000 biomass values normally distributed for each pixel • A thousand maps to generate… • … How???
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The answer: Analytics processing on AWS • AWS engages a partner (DataRain) • Two PoCs • Four EC2 instances Linux 64 cores/256 Gbytes each • Anaconda/H2O Python environment • Script with lots of parallel processing • Divided Amazon area into16 segments • Two operators to run everything in 40 hours • We downloaded the 1000 maps and sumarize at INPE • It tooks about 2 days to generate the final map
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Architecture AWS CloudINPE Researcher Desktops Amazon S3 Amazon EC2 Amazon EBS Internet Amazon EC2 Amazon EBS Amazon EC2 Amazon EBS Amazon EC2 Amazon EBS
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Main benefits • Uncertainty map itself • DataRain (AWS partner) support • First time we use cloud services at INPE • Map obtained before the end of the project • ROI
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Return of the investment • LiDAR Fligts costs $2,000 each • 1000 flights => $2M • To update the model: 100~150 flights • 150 flights => $300k • Cost of map generation: $10,000 • Money saved in the next map update: $2M – $300k – $10k = $1.69 M
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Agilityand InnovationAreKey Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Mauro Assis assismauro@hotmail.com Angelo Carvalho carvaa@amazon.com
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T