SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Daniel Bento, Solutions Architect
Working Within the Data Lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Working Within the Data Lake
1. Optimizing for Cost and Performance
2. Cataloging Data Schemas with AWS Glue
3. Transforming Data with AWS Glue
4. Querying the Data Lake with Amazon Athena
5. Visualizing the Data Lake with Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Data lake and Bigdata Services
Catalog & Search Access & User Interfaces
Data Ingestion
Analytics & Serving
S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
CloudTrail
Manage & Secure
AWS
IAM
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
Central Storage
Scalable, secure, cost-
effective
AWS
Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managed Services
Pay for what you use, not
for what you run
Partitioning
Pay for data your query needs,
not to scan all of your data
Compression
Pay for what you store,
not for what you process
Optimizing for Cost and Performance
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Partitioning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
select * from datalake where
dt=20170515 and country=US
datalake
├── 20170515T1423-GB-01.tar.gz
├── 20170515T1423-GB-02.tar.gz
├── 20170515T1500-US-01.tar.gz
├── 20170516T1500-US-01.tar.gz
├── 20170516T1600-GB-01.tar.gz
└── 20170516T1600-GB-02.tar.gz
datalake
├── dt=20170515
│ ├── country=GB
│ │ ├── 20170515T1423-GB-01.tar.gz
│ │ └── 20170515T1423-GB-02.tar.gz
│ └── country=US
│ └── 20170515T1500-US-01.tar.gz
└── dt=20170516
├── country=GB
│ ├── 20170516T1600-GB-01.tar.gz
│ └── 20170516T1600-GB-02.tar.gz
└── country=US
└── 20170516T1500-US-01.tar.gz
Partitioning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
select count(*) from datalake where
dt=‘20170515’
select count(*) from datalake where
dt >= ‘20170515’ and dt < ‘20170516’
Non-Partitioned Partitioned Non-Partitioned Partitioned
Run Time 9.71 sec 2.16 sec 10.41 sec 2.73 sec
Data
Scanned
74.1 GB 29.06 MB 74.1 GB 871.39 MB
Cost $0.36 $0.0001 $0.36 $0.004
Results 77% faster, 99% cheaper 73% faster, 98% cheaper
Partitioning - Advantages
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compression
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Compressing your data can speed up your queries significantly
• Splittable formats enable parallel processing across nodes
Algorith
m
Splittable Compressio
n Ratio
Algorithm
Speed
Good For
Gzip
(DEFLAT
E)
No High Medium Raw Storage
bzip2 Yes Very High Slow Very Large Files
LZO Yes Low Fast Slow Analytics
Snappy Yes and No
*
Low Very Fast Slow & Fast
Analytics
* Depends on if the source format is splittable and can output each record into a Snappy Block
Compression
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compression - Example
Snappy Compression with Parquet File Format
Format Size on S3 Run Time Data
Scanned
Cost
Text 1.15 TB 3m 56s 1.15 TB $5.75
Parquet 130 GB 6.78s 2.51 GB $0.013
Result 87% less 34x faster 99% less 99.7%
savings
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compression – File Counts
Fewer, larger files are better than many, smaller files (when splittable)
• Faster Listing Operations
• Fewer Requests to Amazon S3
• Less Metadata to Manage
• Faster Query Performance
Query # files Run Time
select count(*) from
datalake
5000 files 8.4 sec
select count(*) from
datalake
1 file 2.31 sec
Result 72% Faster
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cataloging and Transforming
Data
Working with AWS Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Transforming Data
Over 90% of ETL jobs in the cloud are hand-coded
Which is good …
• Flexible
• Powerful
• Unit Tests
• CI/CD
• Developer Tools …
… but also bad!
• Brittle
• Error-Prone
• Laborious
• Sources Change
• Schemas Change
• Volume Changes
• EVERYTHING KEEPS CHANGING !!!
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Discover
Develop
Deploy
Automatically discover and categorize your data making it immediately
searchable and queryable across data sources
Generate code to clean, enrich, and reliably move data between various
data sources
Run your jobs on a serverless, fully managed, scale-out environment.
No compute resources to provision or manage.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Components
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and creates tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon
EMR
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring and alerting
 Auto-generates ETL code
 Build on open frameworks – Python/Scala and Apache Spark
 Developer-centric – editing, debugging, sharingJob Authoring
Data Catalog
Job Execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Components
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and creates tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon
EMR
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring and alerting
 Auto-generates ETL code
 Build on open frameworks – Python/Scala and Apache Spark
 Developer-centric – editing, debugging, sharingJob Authoring
Data Catalog
Job Execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue: Data Catalog
Features Include
 Search metadata for data
discovery
 Connections to S3, RDS,
Redshift, JDBC, and DynamoDB
 Classification for identifying and
parsing files
 Versioning of table metadata as
schemas
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data
Catalog
Glue: Data Catalog – Queryable by Many Services
Glue ETL
Amazon Athena
Redshift Spectrum
EMR
(Hadoop/Spark)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue: Data Catalog - Crawlers
Features Include:
• Built-in classifiers
• Detect file type
• Extract schema
• Identify partitions
• On-Demand or Scheduled
Execution
• Build-you-own classifiers
• Grok for ease of use
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawlers: Classifiers
IAM Role
Glue Crawler
Data Lakes
Data Warehouse
Databases
Amazon RDS
Amazon Redshift
Amazon S3
JDBC Connection
Object Connection
Built-In Classifiers
MySQL
MariaDB
PostreSQL
Aurora
Redshift
Avro
Parquet
ORC
JSON & BJSON
Logs
(Apache, Linux, MS, Ruby, Redis, and many others)
Delimited
(comma, pipe, tab, semicolon)
Compressed Formats
(ZIP, BZIP, GZIP, LZ4, Snappy)
Create additional Custom Classifiers
with Grok!
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue Data Catalog
Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single
categorized list that is searchable
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Table details
Table schema
Table properties
Data statistics
Nested fields
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Version control
List of table versionsCompare schema versions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
file 1 file N… file 1 file N…
date=10 date=15…
month=Nov
S3 bucket hierarchy Table definition
Estimate schema similarity among files at each level to
handle semi-structured logs, schema evolution…
sim=.99 sim=.95
sim=.93
month
date
col 1
col 2
str
str
int
float
Column Type
Glue: Crawlers – Detecting Schema and Partitions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Automatic partition detection
Automatically register available partitions
Table
partitions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Components
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and creates tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon
EMR
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring and alerting
 Auto-generates ETL code
 Build on open frameworks – Python/Scala and Apache Spark
 Developer-centric – editing, debugging, sharingJob Authoring
Data Catalog
Job Execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Job Authoring - Pick a Source
Choose from Manually-Defined or Crawler-Generated Sources:
• S3 Bucket
• RDBMS
• Redshift
• DynamoDB
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Job Authoring – Pick a Target
Write output results into an existing table.
Crawler-Defined or Manually-Created:
• S3 Bucket
• RDBMS
• Redshift
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Job Authoring – Code Generation
Existing columns
in target
Can extend/add
new columns to
target
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Customize the mappings
2. Glue generates transformation graph and either Python or Scala code
3. Specify trigger condition
AWS Glue: Job Authoring – Code Generation
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
 Human-readable, editable, and portable PySpark or Scala code
 Flexible: Glue’s ETL library simplifies manipulating complex, semi-structured data
 Customizable: Use native PySpark / Scala, import custom libraries, and/or leverage Glue’s
libraries
 Collaborative: share code snippets via GitHub, reuse code across jobs
Job authoring: ETL code
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Components
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and creates tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon
EMR
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring and alerting
 Auto-generates ETL code
 Build on open frameworks – Python/Scala and Apache Spark
 Developer-centric – editing, debugging, sharingJob Authoring
Data Catalog
Job Execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compose jobs globally with event-
based dependencies
 Easy to reuse and leverage work across
organization boundaries
Multiple triggering mechanisms
 Schedule-based: e.g., time of day
 Event-based: e.g., job completion
 On-demand: e.g., AWS Lambda
Logs and alerts are available in
Amazon CloudWatch
Marketing: Ad-spend by
customer segment
Event Based
Lambda Trigger
Sales: Revenue by
customer segment
Schedule
Data
based
Central: ROI by
customer
segment
Weekly
sales
Data
based
AWS Glue Orchestration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Track ETL job progress and inspect logs directly from the console.
• Logs are written to CloudWatch for simple access.
• Errors are automatically extracted and presented in the Error Logs for
easy troubleshooting of jobs.
AWS Glue: Job Execution – Progress and History
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue: Overall Flow
1. Crawl your raw
data
2. Create your desired targets
3. Generate and prep your ETL
4. Execute your job
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Querying the Data Lake
Working with Amazon Athena
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
An interactive query service that makes it easy to analyze
data directly from Amazon S3 using Standard SQL
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does it look like?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use ANSI SQL
• Support for complex joins,
nested queries & window
functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data
by any key
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Save by using compression, columnar formats, partitions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless data lake & analytics with AWS
Crawlers scan your data sets and populate the Glue Data Catalog
The Glue Data Catalog serves as a central metadata repository
Once catalogued in Glue, your data is immediately available for analytics
1
2
3
AMAZON
REDSHIFT
SPECTRUM
AMAZON
EMR
AMAZON
ATHENA
AWS GLUE
DATA CATALOG
AWS GLUE
CRAWLER
AMAZON S3 QUICKSIGHT
1 2
3
https://github.com/aws-samples/serverless-data-analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualizing the Data Lake
Working with Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QuickSight Overview
Scalable
QuickSight automatically scales with your usage and
activity, with no need for additional infrastructure.
From 10 users to 10,000 QuickSight seamlessly grows
with you
No Servers to Manage
QuickSight is a fully managed cloud application,
meaning there's no upfront cost, software to deploy,
capacity planning, maintenance, upgrades, or
migrations
Fully integrated
QuickSight integrates with your data sources and
other AWS services like Redshift, S3, Athena, Aurora,
RDS, IAM, CloudTrail, Cloud Directory and more –
providing you with everything you need to build an
end-to-end BI solution.
Pay For What You Use
Provide read-only access to interactive dashboards and
pay only when your users access them with pay-per-
session pricing. There are no upfront costs, no annual
commitments, and no charges for inactive users.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Connect to your data, wherever it is
QuickSight is natively integrated with AWS data sources, as well as on-premise and hosted
databases and third party business applications
On-premises
Securely connect to on-premise
databases and flat files like
Excel and CSV
In the cloud
Connect to hosted database, big
data formats, and secure VPCs
Applications
Connect directly to third
party business applications
• Salesforce
• Square
• Adobe Analytics
• Jira
• ServiceNow
• Twitter
• Github
• Redshift
• RDS
• S3
• Athena
• Aurora
• Teradata
• MySQL
• Presto
• Spark
• SQL Server
• Postgre SQL
• MariaDB
• Snowflake
• IoT Analytics
• Excel
• CSV
• Teradata
• MySQL
• SQL Server
• PostgreSQL
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Preview data
Rename, remove fields, change data types
Create new calculated fields
Filter rows
Issue direct query or ingest to SPICE
Visually join tables from the same relational DB
Push down custom SQL queries
Data Prep
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SPICE
QuickSight is powered by SPICE, a super-fast calculation engine that delivers
performance and scale, regardless of how many users are active.
SPICEYour Data Source
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
User Types / User Roles
Admin
Manage Users
Manage SPICE Capacity
Manage VPC Connections
Manage Account Settings
Author
Create Data Sets
Create Analyses
Create Dashboards
Reader
Consume Dashboards
QS Admin
Sometimes separate from Business Users,
sometimes the same
Usually has AWS Console access
Business User
Anyone
Can be internal or external users
(customers/partners3rd parties)
Analyst
Sometimes in IT, sometimes Business Users
‘Data Analyst’
‘Data Engineer’
‘BI Engineer’
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data governance
Create managed datasets that give power users and authors the flexibility to
perform self-serve analytics on data that you control.
Create datasets that:
• Can be shared with any user
• Automatically refresh
• Have row level security
• Users cannot modify
• Dynamically update
with changes
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample Embedded Dashboards
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anomaly Detection
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Working Within the Data Lake
1. Optimizing for Cost and Performance
2. Cataloging Data Schemas with AWS Glue
3. Transforming Data with AWS Glue
4. Querying the Data Lake with Amazon Athena
5. Visualizing the Data Lake with Amazon QuickSight

Mais conteúdo relacionado

Mais procurados

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Amazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Amazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018Amazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Amazon Web Services
 
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksOracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksAmazon Web Services
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobAmazon Web Services LATAM
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Amazon Web Services
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications Amazon Web Services
 

Mais procurados (20)

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018
Migrating Your NoSQL Database to Amazon DynamoDB (DAT314) - AWS re:Invent 2018
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
 
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksOracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right Job
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications
 

Semelhante a Immersion Day - Como gerenciar seu catálogo de dados e processo de transformação

Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakejavier ramirez
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 

Semelhante a Immersion Day - Como gerenciar seu catálogo de dados e processo de transformação (20)

Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 

Mais de Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

Mais de Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Immersion Day - Como gerenciar seu catálogo de dados e processo de transformação

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Daniel Bento, Solutions Architect Working Within the Data Lake
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Working Within the Data Lake 1. Optimizing for Cost and Performance 2. Cataloging Data Schemas with AWS Glue 3. Transforming Data with AWS Glue 4. Querying the Data Lake with Amazon Athena 5. Visualizing the Data Lake with Amazon QuickSight
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Data lake and Bigdata Services Catalog & Search Access & User Interfaces Data Ingestion Analytics & Serving S3 Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service Amazon Athena Amazon EMR AWS Glue Amazon Redshift Amazon DynamoDB Amazon QuickSight Amazon Kinesis Amazon Elasticsearch Service Amazon Neptune Amazon RDS Central Storage Scalable, secure, cost- effective AWS Glue
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managed Services Pay for what you use, not for what you run Partitioning Pay for data your query needs, not to scan all of your data Compression Pay for what you store, not for what you process Optimizing for Cost and Performance
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Partitioning
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. select * from datalake where dt=20170515 and country=US datalake ├── 20170515T1423-GB-01.tar.gz ├── 20170515T1423-GB-02.tar.gz ├── 20170515T1500-US-01.tar.gz ├── 20170516T1500-US-01.tar.gz ├── 20170516T1600-GB-01.tar.gz └── 20170516T1600-GB-02.tar.gz datalake ├── dt=20170515 │ ├── country=GB │ │ ├── 20170515T1423-GB-01.tar.gz │ │ └── 20170515T1423-GB-02.tar.gz │ └── country=US │ └── 20170515T1500-US-01.tar.gz └── dt=20170516 ├── country=GB │ ├── 20170516T1600-GB-01.tar.gz │ └── 20170516T1600-GB-02.tar.gz └── country=US └── 20170516T1500-US-01.tar.gz Partitioning
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. select count(*) from datalake where dt=‘20170515’ select count(*) from datalake where dt >= ‘20170515’ and dt < ‘20170516’ Non-Partitioned Partitioned Non-Partitioned Partitioned Run Time 9.71 sec 2.16 sec 10.41 sec 2.73 sec Data Scanned 74.1 GB 29.06 MB 74.1 GB 871.39 MB Cost $0.36 $0.0001 $0.36 $0.004 Results 77% faster, 99% cheaper 73% faster, 98% cheaper Partitioning - Advantages
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compression
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Compressing your data can speed up your queries significantly • Splittable formats enable parallel processing across nodes Algorith m Splittable Compressio n Ratio Algorithm Speed Good For Gzip (DEFLAT E) No High Medium Raw Storage bzip2 Yes Very High Slow Very Large Files LZO Yes Low Fast Slow Analytics Snappy Yes and No * Low Very Fast Slow & Fast Analytics * Depends on if the source format is splittable and can output each record into a Snappy Block Compression
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compression - Example Snappy Compression with Parquet File Format Format Size on S3 Run Time Data Scanned Cost Text 1.15 TB 3m 56s 1.15 TB $5.75 Parquet 130 GB 6.78s 2.51 GB $0.013 Result 87% less 34x faster 99% less 99.7% savings
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compression – File Counts Fewer, larger files are better than many, smaller files (when splittable) • Faster Listing Operations • Fewer Requests to Amazon S3 • Less Metadata to Manage • Faster Query Performance Query # files Run Time select count(*) from datalake 5000 files 8.4 sec select count(*) from datalake 1 file 2.31 sec Result 72% Faster
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cataloging and Transforming Data Working with AWS Glue
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming Data Over 90% of ETL jobs in the cloud are hand-coded Which is good … • Flexible • Powerful • Unit Tests • CI/CD • Developer Tools … … but also bad! • Brittle • Error-Prone • Laborious • Sources Change • Schemas Change • Volume Changes • EVERYTHING KEEPS CHANGING !!!
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Discover Develop Deploy Automatically discover and categorize your data making it immediately searchable and queryable across data sources Generate code to clean, enrich, and reliably move data between various data sources Run your jobs on a serverless, fully managed, scale-out environment. No compute resources to provision or manage.
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Components  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and creates tables  Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon EMR  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring and alerting  Auto-generates ETL code  Build on open frameworks – Python/Scala and Apache Spark  Developer-centric – editing, debugging, sharingJob Authoring Data Catalog Job Execution
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Components  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and creates tables  Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon EMR  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring and alerting  Auto-generates ETL code  Build on open frameworks – Python/Scala and Apache Spark  Developer-centric – editing, debugging, sharingJob Authoring Data Catalog Job Execution
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Glue: Data Catalog Features Include  Search metadata for data discovery  Connections to S3, RDS, Redshift, JDBC, and DynamoDB  Classification for identifying and parsing files  Versioning of table metadata as schemas
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Glue Data Catalog Glue: Data Catalog – Queryable by Many Services Glue ETL Amazon Athena Redshift Spectrum EMR (Hadoop/Spark)
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Glue: Data Catalog - Crawlers Features Include: • Built-in classifiers • Detect file type • Extract schema • Identify partitions • On-Demand or Scheduled Execution • Build-you-own classifiers • Grok for ease of use
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Crawlers: Classifiers IAM Role Glue Crawler Data Lakes Data Warehouse Databases Amazon RDS Amazon Redshift Amazon S3 JDBC Connection Object Connection Built-In Classifiers MySQL MariaDB PostreSQL Aurora Redshift Avro Parquet ORC JSON & BJSON Logs (Apache, Linux, MS, Ruby, Redis, and many others) Delimited (comma, pipe, tab, semicolon) Compressed Formats (ZIP, BZIP, GZIP, LZ4, Snappy) Create additional Custom Classifiers with Grok!
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue Data Catalog Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single categorized list that is searchable
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Table details Table schema Table properties Data statistics Nested fields
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Version control List of table versionsCompare schema versions
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. file 1 file N… file 1 file N… date=10 date=15… month=Nov S3 bucket hierarchy Table definition Estimate schema similarity among files at each level to handle semi-structured logs, schema evolution… sim=.99 sim=.95 sim=.93 month date col 1 col 2 str str int float Column Type Glue: Crawlers – Detecting Schema and Partitions
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Automatic partition detection Automatically register available partitions Table partitions
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Components  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and creates tables  Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon EMR  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring and alerting  Auto-generates ETL code  Build on open frameworks – Python/Scala and Apache Spark  Developer-centric – editing, debugging, sharingJob Authoring Data Catalog Job Execution
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Job Authoring - Pick a Source Choose from Manually-Defined or Crawler-Generated Sources: • S3 Bucket • RDBMS • Redshift • DynamoDB
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Job Authoring – Pick a Target Write output results into an existing table. Crawler-Defined or Manually-Created: • S3 Bucket • RDBMS • Redshift
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Job Authoring – Code Generation Existing columns in target Can extend/add new columns to target
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Customize the mappings 2. Glue generates transformation graph and either Python or Scala code 3. Specify trigger condition AWS Glue: Job Authoring – Code Generation
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.  Human-readable, editable, and portable PySpark or Scala code  Flexible: Glue’s ETL library simplifies manipulating complex, semi-structured data  Customizable: Use native PySpark / Scala, import custom libraries, and/or leverage Glue’s libraries  Collaborative: share code snippets via GitHub, reuse code across jobs Job authoring: ETL code
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Components  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and creates tables  Integrated with Amazon Athena, Amazon Redshift Spectrum, Amazon EMR  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring and alerting  Auto-generates ETL code  Build on open frameworks – Python/Scala and Apache Spark  Developer-centric – editing, debugging, sharingJob Authoring Data Catalog Job Execution
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compose jobs globally with event- based dependencies  Easy to reuse and leverage work across organization boundaries Multiple triggering mechanisms  Schedule-based: e.g., time of day  Event-based: e.g., job completion  On-demand: e.g., AWS Lambda Logs and alerts are available in Amazon CloudWatch Marketing: Ad-spend by customer segment Event Based Lambda Trigger Sales: Revenue by customer segment Schedule Data based Central: ROI by customer segment Weekly sales Data based AWS Glue Orchestration
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Track ETL job progress and inspect logs directly from the console. • Logs are written to CloudWatch for simple access. • Errors are automatically extracted and presented in the Error Logs for easy troubleshooting of jobs. AWS Glue: Job Execution – Progress and History
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue: Overall Flow 1. Crawl your raw data 2. Create your desired targets 3. Generate and prep your ETL 4. Execute your job
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Querying the Data Lake Working with Amazon Athena
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. An interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does it look like?
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Athena is Serverless • No Infrastructure or administration • Zero Spin up time • Transparent upgrades
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use ANSI SQL • Support for complex joins, nested queries & window functions • Support for complex data types (arrays, structs) • Support for partitioning of data by any key
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Familiar Technologies Under the Covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDL functionality Complex data types Multitude of formats Supports data partitioning
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena is Cost Effective • Pay per query • $5 per TB scanned from S3 • DDL Queries and failed queries are free • Save by using compression, columnar formats, partitions
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless data lake & analytics with AWS Crawlers scan your data sets and populate the Glue Data Catalog The Glue Data Catalog serves as a central metadata repository Once catalogued in Glue, your data is immediately available for analytics 1 2 3 AMAZON REDSHIFT SPECTRUM AMAZON EMR AMAZON ATHENA AWS GLUE DATA CATALOG AWS GLUE CRAWLER AMAZON S3 QUICKSIGHT 1 2 3 https://github.com/aws-samples/serverless-data-analytics
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualizing the Data Lake Working with Amazon QuickSight
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. QuickSight Overview Scalable QuickSight automatically scales with your usage and activity, with no need for additional infrastructure. From 10 users to 10,000 QuickSight seamlessly grows with you No Servers to Manage QuickSight is a fully managed cloud application, meaning there's no upfront cost, software to deploy, capacity planning, maintenance, upgrades, or migrations Fully integrated QuickSight integrates with your data sources and other AWS services like Redshift, S3, Athena, Aurora, RDS, IAM, CloudTrail, Cloud Directory and more – providing you with everything you need to build an end-to-end BI solution. Pay For What You Use Provide read-only access to interactive dashboards and pay only when your users access them with pay-per- session pricing. There are no upfront costs, no annual commitments, and no charges for inactive users.
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Connect to your data, wherever it is QuickSight is natively integrated with AWS data sources, as well as on-premise and hosted databases and third party business applications On-premises Securely connect to on-premise databases and flat files like Excel and CSV In the cloud Connect to hosted database, big data formats, and secure VPCs Applications Connect directly to third party business applications • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • Github • Redshift • RDS • S3 • Athena • Aurora • Teradata • MySQL • Presto • Spark • SQL Server • Postgre SQL • MariaDB • Snowflake • IoT Analytics • Excel • CSV • Teradata • MySQL • SQL Server • PostgreSQL
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Preview data Rename, remove fields, change data types Create new calculated fields Filter rows Issue direct query or ingest to SPICE Visually join tables from the same relational DB Push down custom SQL queries Data Prep
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SPICE QuickSight is powered by SPICE, a super-fast calculation engine that delivers performance and scale, regardless of how many users are active. SPICEYour Data Source
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. User Types / User Roles Admin Manage Users Manage SPICE Capacity Manage VPC Connections Manage Account Settings Author Create Data Sets Create Analyses Create Dashboards Reader Consume Dashboards QS Admin Sometimes separate from Business Users, sometimes the same Usually has AWS Console access Business User Anyone Can be internal or external users (customers/partners3rd parties) Analyst Sometimes in IT, sometimes Business Users ‘Data Analyst’ ‘Data Engineer’ ‘BI Engineer’
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data governance Create managed datasets that give power users and authors the flexibility to perform self-serve analytics on data that you control. Create datasets that: • Can be shared with any user • Automatically refresh • Have row level security • Users cannot modify • Dynamically update with changes
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sample Embedded Dashboards
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anomaly Detection
  • 54. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Working Within the Data Lake 1. Optimizing for Cost and Performance 2. Cataloging Data Schemas with AWS Glue 3. Transforming Data with AWS Glue 4. Querying the Data Lake with Amazon Athena 5. Visualizing the Data Lake with Amazon QuickSight