SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Eden Perry
Partner Solutions Architect, AWS
Rafi Ton
CEO, Nuviad
Success Has Many Query Engines
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Agenda
• AWS Big Data Platform Overview
• AWS Glue Data Catalog
• Amazon Athena
• Amazon Redshift Spectrum
• Customer Story – NUVIAD
• The Right Tool for The Right Job
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Data Services Overview
AWS Big Data Services
EMR EC2
S3
Amazon
Redshift/Spectrum
DynamoDB
AWS Lambda
Kinesis Analytics Amazon Athena
Amazon
QuickSight
Aurora
Kinesis
Streams
Ingest/Collect Store Analyze/Process
Visualization/
Consume
AWS
Snowball
ISV
Connectors
Kinesis
Firehose
S3 Transfer
Acceleration
= Serverless
Amazon
Elasticsearch
Orchestration/Transform
AWS DMS (CDC)AWS Glue AWS Step
Functions
Orchestration/Transform
AWS Big Data Services
EMR EC2
S3
DynamoDB
AWS DMS (CDC)
AWS Lambda
Kinesis Analytics Amazon Athena
Amazon
QuickSight
Aurora
AWS Glue
Kinesis
Streams
Ingest/Collect Store Analyze/Process
Visualization/
Consume
AWS
Snowball
ISV
Connectors
Kinesis
Firehose
S3 Transfer
Acceleration
= Serverless
Amazon
Elasticsearch
AWS Step
Functions
Amazon
Redshift/Spectrum
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue: Fully managed ETL service
• Catalog data sources
• formats and data types manually &
Automatically (with Crawlers)
• Generate ETL code
• Schedules and executes ETL jobs
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
What are our options for Analytics?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
ConsumeStore Process & AnalyzeIngest
Kinesis Data Streams
Kinesis Firehose
Delivery Streams
DynamoDB
AWS Lambda
Kinesis
Analytics
Raw Bucket
Parquet Bucket
Athena Redshift
Spectrum
QuickSight
SpeedLayerBatchLayer
Glue Data
Catalog
Spark/EMR Glue ETL
Real time
Web UI
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Athena:
A Deeper Look
Amazon Athena
• Interactive query service over S3
• Uses ANSI SQL (everybody knows SQL)
• Serverless - Don’t worry about setting up
infrastructure, just start querying
• No ETL
• Get started instantly:
• Point to data in S3
• Define your schema - Schema on Read
• Start querying
Amazon Athena: Pay Per Query
• Pay only for the queries you run - $5 Per TB
Scanned
• Query directly from S3, no additional storage
charges
• Works with standard data formats:
• CSV, JSON
• Parquet, ORC, Avro
• Improve performance and reduces cost:
• Compression
• Partitioning
• Columnar formats
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Athena
Demo
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
ConsumeStore Process & AnalyzeIngest
Kinesis Data Streams
Kinesis Firehose
Delivery Streams
DynamoDB
AWS Lambda
Kinesis
Analytics
Raw Bucket
Parquet Bucket
Athena Redshift
Spectrum
QuickSight
SpeedLayerBatchLayer
Glue Data
Catalog
Spark/EMR Glue ETL
Real time
Web UI
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Redshift Spectrum
Extend you data reach
The tyranny of “OR”
Amazon EMR
Directly access data in S3
Scale out to thousands of nodes
Open data formats
Popular big data frameworks
Anything you can dream up and code
Amazon Redshift
Super-fast local disk performance
Sophisticated query optimization
Join-optimized data formats
Query using standard SQL
Optimized for data warehousing
We want
Sophisticated query optimization and scale-out processing
Super fast performance and support for open formats
The throughput of local disk and the scale of S3
We want all this
From one data processing engine
With my data accessible from all data processing engines
Now and in the future
Amazon Redshift Spectrum
Amazon Redshift Spectrum enables you to run
Amazon Redshift SQL queries against Exabytes of data
in Amazon S3.
With Redshift Spectrum, you can extend the analytic
power of Amazon Redshift beyond data stored on local
disks in your data warehouse to query vast amounts of
unstructured data in your Amazon S3 “data lake”
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
1
Query is optimized and compiled at
the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
2
Query plan is sent to
all compute nodes
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
3
Compute nodes obtain partition info from
Data Catalog; dynamically prune
partitions
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
4
Each compute node issues
multiple requests to the Amazon
Redshift Spectrum layer
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
5
Amazon Redshift Spectrum nodes
scan your S3 data
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
6
7
Amazon Redshift
Spectrum projects,
filters, and aggregates
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
8
Result is sent back to client
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
9
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Redshift Spectrum Demo
S3
Redshift
Redshift
• Goal: Determine the correlation
between Tweets’ sentiment and the
weather
• Tweets are stored in our Data Lake - S3
• States and Historical Weather data are
stored in our Data Warehouse –
Redshift
• Method: Join between S3 data and
Redshift data using Redshift Spectrum
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Customer Story: NUVIAD
The local marketing company.
Company Confidential
NUVIAD
NUVIAD is an online marketing service that makes
data-driven, location-aware mobile marketing
accessible, effective and simple to use for agencies,
networks and businesses.
NUVIAD HyperDSP
NUVIAD Local
NUVIAD inView SSP
Company Confidential
NUVIAD In Numbers
• Analyzing over 700k ad opportunities every second
• Over 2.5 Billion user profiles
• Over 500k app profiles from App Store and Google Play
• Local businesses mapped via multiple sources
• Thousands of customers. From small businesses to large
networks
• Tens of thousands of campaigns
Company Confidential
Data is Everything
Identifying the user’s intent – the holy grail of digital
marketing. Finding the right moment that the user is
most receptive to my marketing message
• Digital advertising was focused on the digital aspects of
user
• HyperLocal adds the physical context
Company Confidential
The Challenge
• Streamlining high scale data from hundreds of thousands of
sources. Finding the needles in the huge haystack.
• Providing effective data driven tools to our customers and
partners
• Effectively scaling up the platform
• It is easy to create a platform that delivers data right or now.
We need to provide data right now. Speed, speed and more
speed.
From
Data Warehouse
To
Data Lake
Company Confidential
Traditional Methodology
• Data Warehouse - Redshift (over 60 dc1.large servers in two clusters)
• Algorithmic processes – Redshift
• Reporting and Analytics – RDS (MySQL)
• Real-time reporting – memSQL
• Aerospike, MongoDB
• Result
• Fragmented data
• Multiple data sets
• Hard to scale up (storage vs. compute)
• Maintenance nightmare
Company Confidential
NUVIAD Data Lake
One pond of data, multiple query engines
• S3 as main storage for all data
• Formatting data in a open and common format
• Creating unified data streams to ingest data
• Separating compute from Storage to allow
manageable and ad-hoc scaling
• Using different query engines using the same data
• Using different data permutations to optimize data
for queries
Company Confidential
One Data Set. Many Query engines.
• With Data formatted correctly (Parquet) and
stored in S3, we can use different query engines
for different workloads:
• Amazon Athena - for quick and simple queries
• Amazon Redshift Spectrum - for complex algorithmic
queries utilizing PostgreSQL
• EMR Presto - for large reports with scalable cluster
Company Confidential
Faster Results. Fraction of the Cost
The local marketing company.
Thank you.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
The Right Tool for The Right Job
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank You!

Mais conteúdo relacionado

Mais procurados

Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018Amazon Web Services
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Amazon Web Services
 
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 

Mais procurados (20)

Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Customer Uses of Data Lakes
Customer Uses of Data LakesCustomer Uses of Data Lakes
Customer Uses of Data Lakes
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
 
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...
One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT3...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 

Semelhante a Success has Many Query Engines- Tel Aviv Summit 2018

Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Amazon Web Services
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 

Semelhante a Success has Many Query Engines- Tel Aviv Summit 2018 (20)

Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Success has Many Query Engines- Tel Aviv Summit 2018

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Eden Perry Partner Solutions Architect, AWS Rafi Ton CEO, Nuviad Success Has Many Query Engines
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Agenda • AWS Big Data Platform Overview • AWS Glue Data Catalog • Amazon Athena • Amazon Redshift Spectrum • Customer Story – NUVIAD • The Right Tool for The Right Job
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Data Services Overview
  • 4. AWS Big Data Services EMR EC2 S3 Amazon Redshift/Spectrum DynamoDB AWS Lambda Kinesis Analytics Amazon Athena Amazon QuickSight Aurora Kinesis Streams Ingest/Collect Store Analyze/Process Visualization/ Consume AWS Snowball ISV Connectors Kinesis Firehose S3 Transfer Acceleration = Serverless Amazon Elasticsearch Orchestration/Transform AWS DMS (CDC)AWS Glue AWS Step Functions
  • 5. Orchestration/Transform AWS Big Data Services EMR EC2 S3 DynamoDB AWS DMS (CDC) AWS Lambda Kinesis Analytics Amazon Athena Amazon QuickSight Aurora AWS Glue Kinesis Streams Ingest/Collect Store Analyze/Process Visualization/ Consume AWS Snowball ISV Connectors Kinesis Firehose S3 Transfer Acceleration = Serverless Amazon Elasticsearch AWS Step Functions Amazon Redshift/Spectrum
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue: Fully managed ETL service • Catalog data sources • formats and data types manually & Automatically (with Crawlers) • Generate ETL code • Schedules and executes ETL jobs
  • 7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. What are our options for Analytics?
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. ConsumeStore Process & AnalyzeIngest Kinesis Data Streams Kinesis Firehose Delivery Streams DynamoDB AWS Lambda Kinesis Analytics Raw Bucket Parquet Bucket Athena Redshift Spectrum QuickSight SpeedLayerBatchLayer Glue Data Catalog Spark/EMR Glue ETL Real time Web UI
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Athena: A Deeper Look
  • 10. Amazon Athena • Interactive query service over S3 • Uses ANSI SQL (everybody knows SQL) • Serverless - Don’t worry about setting up infrastructure, just start querying • No ETL • Get started instantly: • Point to data in S3 • Define your schema - Schema on Read • Start querying
  • 11. Amazon Athena: Pay Per Query • Pay only for the queries you run - $5 Per TB Scanned • Query directly from S3, no additional storage charges • Works with standard data formats: • CSV, JSON • Parquet, ORC, Avro • Improve performance and reduces cost: • Compression • Partitioning • Columnar formats
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Athena Demo
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. ConsumeStore Process & AnalyzeIngest Kinesis Data Streams Kinesis Firehose Delivery Streams DynamoDB AWS Lambda Kinesis Analytics Raw Bucket Parquet Bucket Athena Redshift Spectrum QuickSight SpeedLayerBatchLayer Glue Data Catalog Spark/EMR Glue ETL Real time Web UI
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Redshift Spectrum Extend you data reach
  • 15. The tyranny of “OR” Amazon EMR Directly access data in S3 Scale out to thousands of nodes Open data formats Popular big data frameworks Anything you can dream up and code Amazon Redshift Super-fast local disk performance Sophisticated query optimization Join-optimized data formats Query using standard SQL Optimized for data warehousing
  • 16. We want Sophisticated query optimization and scale-out processing Super fast performance and support for open formats The throughput of local disk and the scale of S3
  • 17. We want all this From one data processing engine With my data accessible from all data processing engines Now and in the future
  • 18. Amazon Redshift Spectrum Amazon Redshift Spectrum enables you to run Amazon Redshift SQL queries against Exabytes of data in Amazon S3. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake”
  • 19. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 1
  • 20. Query is optimized and compiled at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 2
  • 21. Query plan is sent to all compute nodes Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 3
  • 22. Compute nodes obtain partition info from Data Catalog; dynamically prune partitions Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 4
  • 23. Each compute node issues multiple requests to the Amazon Redshift Spectrum layer Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 5
  • 24. Amazon Redshift Spectrum nodes scan your S3 data Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 6
  • 25. 7 Amazon Redshift Spectrum projects, filters, and aggregates Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore
  • 26. Final aggregations and joins with local Amazon Redshift tables done in-cluster Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 8
  • 27. Result is sent back to client Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 9
  • 28. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Redshift Spectrum Demo S3 Redshift Redshift • Goal: Determine the correlation between Tweets’ sentiment and the weather • Tweets are stored in our Data Lake - S3 • States and Historical Weather data are stored in our Data Warehouse – Redshift • Method: Join between S3 data and Redshift data using Redshift Spectrum
  • 29. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Customer Story: NUVIAD
  • 31. Company Confidential NUVIAD NUVIAD is an online marketing service that makes data-driven, location-aware mobile marketing accessible, effective and simple to use for agencies, networks and businesses. NUVIAD HyperDSP NUVIAD Local NUVIAD inView SSP
  • 32. Company Confidential NUVIAD In Numbers • Analyzing over 700k ad opportunities every second • Over 2.5 Billion user profiles • Over 500k app profiles from App Store and Google Play • Local businesses mapped via multiple sources • Thousands of customers. From small businesses to large networks • Tens of thousands of campaigns
  • 33. Company Confidential Data is Everything Identifying the user’s intent – the holy grail of digital marketing. Finding the right moment that the user is most receptive to my marketing message • Digital advertising was focused on the digital aspects of user • HyperLocal adds the physical context
  • 34. Company Confidential The Challenge • Streamlining high scale data from hundreds of thousands of sources. Finding the needles in the huge haystack. • Providing effective data driven tools to our customers and partners • Effectively scaling up the platform • It is easy to create a platform that delivers data right or now. We need to provide data right now. Speed, speed and more speed.
  • 36. Company Confidential Traditional Methodology • Data Warehouse - Redshift (over 60 dc1.large servers in two clusters) • Algorithmic processes – Redshift • Reporting and Analytics – RDS (MySQL) • Real-time reporting – memSQL • Aerospike, MongoDB • Result • Fragmented data • Multiple data sets • Hard to scale up (storage vs. compute) • Maintenance nightmare
  • 37. Company Confidential NUVIAD Data Lake One pond of data, multiple query engines • S3 as main storage for all data • Formatting data in a open and common format • Creating unified data streams to ingest data • Separating compute from Storage to allow manageable and ad-hoc scaling • Using different query engines using the same data • Using different data permutations to optimize data for queries
  • 38. Company Confidential One Data Set. Many Query engines. • With Data formatted correctly (Parquet) and stored in S3, we can use different query engines for different workloads: • Amazon Athena - for quick and simple queries • Amazon Redshift Spectrum - for complex algorithmic queries utilizing PostgreSQL • EMR Presto - for large reports with scalable cluster
  • 39. Company Confidential Faster Results. Fraction of the Cost
  • 40. The local marketing company. Thank you.
  • 41. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. The Right Tool for The Right Job
  • 42. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank You!