SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Advanced Workflows with AWS Glue
Santosh Chandrachood
SDM
AWS Glue
A N T 3 7 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Overview
Building Blocks
Building a usecase
Event driven workflows
Workflow considerations
Monitoring and Tuning
Bring your own workflow engine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Breakout repeats
Monday, Nov 26
ANT 372 – [CT] Building Advanced Workflows with AWS Glue
10:00 – 11:00 | Aria East, Plaza Level, Orovada 3
Tuesday, Nov 27
ANT 333 – [BS] Building Advanced Workflows with AWS Glue
2:30 – 3:30 | Mirage, Grand Ballroom D, Table 4
Wednesday, Nov 28
ANT 381 – [BS] Building Advanced Workflows with AWS Glue
5:30 – 6:30 | Aria West, Level 3, Starvine 10, Table 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fully-managed, serverless extract-transform-load (ETL) service
for developers, built by developers
1000s of customers and jobs
AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OrchestrationData Catalog ETL Engine
Automatic crawling
Apache Hive Metastore compatible
Integrated with AWS analytic services
Discover
Flexible scheduling
Monitoring and alerting
External integrations
Deploy
Apache Spark core
Python and Scala
Auto-generates ETL code
Develop
AWS Glue Components
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Spark is a distributed data processing engine for complex analytics.
AWS Glue builds on the Apache Spark to offer ETL specific functionality.
Spark Core: RDDs
Spark DataFrames Glue DynamicFrames
SparkSQL AWS Glue ETL
Review: Apache Spark and AWS Glue ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Blocks
Crawlers Jobs TriggersEntities
Schedule ExternalEventsDependencies
Conditions TimeoutRetriesControl
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a usecase
raw
optimize optimized
SLA
reporting
New
variable mins variable mins
Goal: compose jobs in DAG through dependencies
In-practice: time-based workflows
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New features
External
Conditions
Control
Publish crawler and job notifications into CloudWatch events
CloudWatch events to control downstream workflows
‘ANY’ and ‘AND’ operators in Trigger conditions
Additional job states ’failed’, ‘stopped’, or ‘timeout’
Configure job timeout
Job delay notifications
On-demand cancel
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example event-driven workflow
raw
optimize optimized
SLA
reporting
New
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow considerations
• Incremental data processing
• Job bookmarks to keep state
• Job parameters to select new datasets
• Job size
• Unique versus One job per logical units of work
• Multiple small jobs or one big job
• Job parameters
• Initial, Global, In-between jobs
• Use Amazon S3 to pass parameters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow considerations
• Data processing unit
• Number of DPUs
• Adjusting DPUs
• SLA
• Job delays notifications
• Timeouts
• Error handling
• Retry logic
• Integration with 3rd party
• Job re-run
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow monitoring—Performance
• How is your dataset
partitioned?
• How is your application
divided into jobs and
stages?
• Data is divided into
partitions that are
processed concurrently
Driver
Executors
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow monitoring—Metrics
• Job metrics
• CPU
• Memory
• Network
• Executors, stages
• Data movement
• Use data points to
adjust job parameters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bring your own workflow engine
Triggers
Top-level AWS
Glue job
CloudWatch
Events
Jobs
Lambda
States
SNS
Config
STEP
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Santosh Chandrachood
glue-feedback@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierAmazon Web Services
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSAmazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Web Services
 
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best PracticesAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 

Mais procurados (20)

Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon Glacier
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
Fundamentals of AWS Security
Fundamentals of AWS SecurityFundamentals of AWS Security
Fundamentals of AWS Security
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Security Architectures on AWS
Security Architectures on AWSSecurity Architectures on AWS
Security Architectures on AWS
 
(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 

Semelhante a Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018

Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...Amazon Web Services
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Amazon Web Services
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Amazon Web Services
 
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...Amazon Web Services
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...Amazon Web Services
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Coordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdfCoordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdfAmazon Web Services
 
Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018AWS Germany
 
Accelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAccelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAmazon Web Services
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
 
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...Amazon Web Services
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
AWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nubeAWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nubeAmazon Web Services LATAM
 
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...Amazon Web Services
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfSal Marcus
 
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Amazon Web Services
 

Semelhante a Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018 (20)

Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Coordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdfCoordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdf
 
Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018
 
Accelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAccelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWS
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
AWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nubeAWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nube
 
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdf
 
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Advanced Workflows with AWS Glue Santosh Chandrachood SDM AWS Glue A N T 3 7 2
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Overview Building Blocks Building a usecase Event driven workflows Workflow considerations Monitoring and Tuning Bring your own workflow engine
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Breakout repeats Monday, Nov 26 ANT 372 – [CT] Building Advanced Workflows with AWS Glue 10:00 – 11:00 | Aria East, Plaza Level, Orovada 3 Tuesday, Nov 27 ANT 333 – [BS] Building Advanced Workflows with AWS Glue 2:30 – 3:30 | Mirage, Grand Ballroom D, Table 4 Wednesday, Nov 28 ANT 381 – [BS] Building Advanced Workflows with AWS Glue 5:30 – 6:30 | Aria West, Level 3, Starvine 10, Table 5
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fully-managed, serverless extract-transform-load (ETL) service for developers, built by developers 1000s of customers and jobs AWS Glue
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. OrchestrationData Catalog ETL Engine Automatic crawling Apache Hive Metastore compatible Integrated with AWS analytic services Discover Flexible scheduling Monitoring and alerting External integrations Deploy Apache Spark core Python and Scala Auto-generates ETL code Develop AWS Glue Components
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark is a distributed data processing engine for complex analytics. AWS Glue builds on the Apache Spark to offer ETL specific functionality. Spark Core: RDDs Spark DataFrames Glue DynamicFrames SparkSQL AWS Glue ETL Review: Apache Spark and AWS Glue ETL
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Blocks Crawlers Jobs TriggersEntities Schedule ExternalEventsDependencies Conditions TimeoutRetriesControl
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a usecase raw optimize optimized SLA reporting New variable mins variable mins Goal: compose jobs in DAG through dependencies In-practice: time-based workflows
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features External Conditions Control Publish crawler and job notifications into CloudWatch events CloudWatch events to control downstream workflows ‘ANY’ and ‘AND’ operators in Trigger conditions Additional job states ’failed’, ‘stopped’, or ‘timeout’ Configure job timeout Job delay notifications On-demand cancel
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example event-driven workflow raw optimize optimized SLA reporting New
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow considerations • Incremental data processing • Job bookmarks to keep state • Job parameters to select new datasets • Job size • Unique versus One job per logical units of work • Multiple small jobs or one big job • Job parameters • Initial, Global, In-between jobs • Use Amazon S3 to pass parameters
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow considerations • Data processing unit • Number of DPUs • Adjusting DPUs • SLA • Job delays notifications • Timeouts • Error handling • Retry logic • Integration with 3rd party • Job re-run
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow monitoring—Performance • How is your dataset partitioned? • How is your application divided into jobs and stages? • Data is divided into partitions that are processed concurrently Driver Executors
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow monitoring—Metrics • Job metrics • CPU • Memory • Network • Executors, stages • Data movement • Use data points to adjust job parameters
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bring your own workflow engine Triggers Top-level AWS Glue job CloudWatch Events Jobs Lambda States SNS Config STEP
  • 17. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Santosh Chandrachood glue-feedback@amazon.com
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.