O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Serverless Extract-transform-load (ETL) on AWS Webinar

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Working with Open Data on AWS
Working with Open Data on AWS
Carregando em…3
×

Confira estes a seguir

1 de 28 Anúncio

Serverless Extract-transform-load (ETL) on AWS Webinar

Previously, ETL meant using proprietary products with commercial databases and users with specialist skills. Learn how to create ETL data pipelines that can securely consume data at scale while using open source technologies and languages to enable your organisation, team, and data.

Speaker: Paul Macey, Big Data Specialist, AWS

Previously, ETL meant using proprietary products with commercial databases and users with specialist skills. Learn how to create ETL data pipelines that can securely consume data at scale while using open source technologies and languages to enable your organisation, team, and data.

Speaker: Paul Macey, Big Data Specialist, AWS

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Serverless Extract-transform-load (ETL) on AWS Webinar (20)

Anúncio

Mais de Amazon Web Services (20)

Serverless Extract-transform-load (ETL) on AWS Webinar

  1. 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Paul Macey Specialist Solution Architect, Big Data and Analytics AWS Public Sector November 2019 Serverless ETL on AWS Deep dive webinar
  2. 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Serverless ETL(SETL) vs Traditional ETL Establishing a repeatable data workflow SETL components & pipeline Use Cases SETL with Amazon Athena Demo SETL & data lake integration Demo Data security & governance Wrap up
  3. 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Outcomes for this session Understand how to use serverless technologies to perform ETL Learn how SETL can be integrated into an existing data pipeline or data lake
  4. 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless vs Traditional ETL Operational Excellence Security Reliability Performance Efficiency Cost Optimization
  5. 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data sources Transport Process & Transform Persist & Store Secure and Deliver Operate & Monitor Establishing a repeatable data workflow Data Lake SETL
  6. 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL Components ETL EngineProcess initiator Workflow coordination Storage
  7. 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL Components – AWS Lambda ETL Engine Process initiator AWS Step Functions Workflow coordination (optional) AWS Lambda Storage Amazon EventBridge AWS Lambda Event Amazon S3 AWS Database Service
  8. 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL Components – AWS Lambda ETL Engine Process initiator AWS Step Functions Workflow coordination (optional) AWS Lambda Storage Amazon EventBridge AWS Lambda Event Amazon S3 AWS Database Service ETL using open source libraries and AWS Lambda: • Arrays and matrices - Numpy • Data manipulation - Pandas • Machine Learning - Scikit • Natural Language Processing - NLTK • Geospatial - Geopandas
  9. 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL Components – Amazon Athena ETL Engine Process initiator AWS Step Functions Workflow coordination (optional) Amazon Athena Storage Amazon EventBridge AWS Lambda Event Amazon S3 AWS Database Service ETL using Amazon Athena (SQL based): • Geospatial • Windowing • JSON parsing • Lambda Expressions and Functions
  10. 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pop quiz https://prestodb.io/
  11. 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use Cases Data masking / hashing Aggregation Reporting Timeseries Data prep for DS/ML Row by row ML
  12. 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL Pricing AWS Lambda Requests First 1M free $0.20 per 1M thereafter Duration First 400,000 GB-seconds per month, up to 3.2M seconds of compute time, are free. $0.0000166667 FOR EVERY GB-SECOND USED THEREAFTER The price depends on the amount of memory you allocate to your function. AWS Athena S3 - Standard S3 rates for storage, requests, and data transfer Athena - $5.00 per TB of data scanned
  13. 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo data flow Amazon S3 Data Lake Amazon Athena Data Catalogue AWS Glue
  14. 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL with Amazon Athena Demo
  15. 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Amazon Athena SETL & data lake integration (Athena Slingshot) Start Small Establish a Repeatable Workflow Deliver benefits Improve and Iterate Repeat
  16. 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL & data lake integration (Athena Slingshot) Amazon Athena workgroups Amazon S3 query destinations Saved tables & queries
  17. 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena slingshot data flow Data Catalogue AWS Glue Staging Amazon S3 SETL Amazon Athena Curated Amazon S3 Gold Amazon S3 Data Catalogue AWS Glue
  18. 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SETL & data lake integration (Athena Slingshot) Demo
  19. 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using SETL in the real world
  20. 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data security & governance Data access control & security Data usage controls Encrypted output
  21. 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Wrap up Understand how to use serverless technologies to perform SETL • Components • Pipelines Learn how SETL can be integrated into an existing data pipeline or data lake • Athena slingshot • Data security and governance
  22. 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue References Restrict access to your AWS glue data catalog https://aws.amazon.com/blogs/big-data/restrict-access-to-your-aws-glue-data-catalog-with-resource- level-iam-permissions-and-resource-based-policies/ Fine grained access to glue resources https://docs.aws.amazon.com/athena/latest/ug/fine-grained-access-to-glue-resources.html
  23. 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Athena References Amazon Athena JDBC & ODBC connectivity https://docs.aws.amazon.com/athena/latest/ug/athena-bi-tools-jdbc-odbc.html Athena workgroup policies https://docs.aws.amazon.com/athena/latest/ug/workgroups-iam-policy.html Athena workgroup policy examples https://docs.aws.amazon.com/athena/latest/ug/example-policies-workgroup.html Presto functions in Athena https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html Working with Query Results and Output Files https://docs.aws.amazon.com/athena/latest/ug/querying.html
  24. 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Pricing AWS Pricing Calculator https://calculator.aws AWS Glue https://aws.amazon.com/glue/pricing/ Amazon Athena https://aws.amazon.com/athena/pricing/ Amazon Athena JDBC & ODBC connectivity https://docs.aws.amazon.com/athena/latest/ug/athena-bi-tools-jdbc-odbc.html AWS Lambda https://aws.amazon.com/lambda/pricing/ Amazon Event Bridge https://aws.amazon.com/eventbridge/pricing/
  25. 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS References AWS Data Flywheel https://pages.awscloud.com/apac-data-flywheel.html https://resources.awscloud.com/aws-data-analytics-machinelearning/data-flywheel-e-book AWS Lake Formation https://aws.amazon.com/blogs/aws/aws-lake-formation-now-generally-available/ Well Architected Framework https://aws.amazon.com/architecture/well-architected/
  26. 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Python Library References https://numpy.org/ https://pandas.pydata.or https://scikit-learn.org/stable/index.html g/ http://geopandas.org/index.html https://www.nltk.org/
  27. 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Data Lake Available today @ GitHub https://github.com/aws-samples/accelerated-data-lake Includes Data lake pipeline (CloudFormation) Instructions Data configuration, security and metadata templates Delivery Professional services AWS partners
  28. 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you

×