SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Adir Sharabi
Solutions Architect, Amazon Web Services
Yair Weinberger
CTO and Co-Founder, Alooma
Big Data on AWS: To infinity and
beyond!
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Big Data Track Agenda
# Time Title Speaker
1 13:15 – 14:00 Big Data on AWS: To infinity and beyond! Adir Sharabi – Solutions Architect
2 14:10 – 14:55 Amazon Kinesis – Building Serverless real-
time solution
Roy Ben Alta – Business
Development Manager
3 15:05 – 15:50 Data preparation and transformation: Spin
your straw into gold
Daniel Haviv – Specialist Solutions
Architect, Analytics
4 16:00 – 16:45 Success has many query engines Eden Perry – Partner Solutions
Architect
5 16:50 – 17:30 Connecting the dots: How Amazon Neptune
and Graph Databases can transform your
business
• Andi Gutmans - GM, Neptune
& Elasticsearch
• Brad Bebee - Principal Prod
Mgmt - Tech, AWS Neptune
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Documents and files Streams
Your Data Sources
Multiple sources and formats… and growing everyday
Records
Amazon
RDS
Amazon
DynamoDB
AWS IoT
On Premises
databases
Spreadsheets Infrastructure
logs
Clickstream data Mobile app data
Social media data Amazon
Redshift
Device data
Sensor data
ERP WEB
Clickstream
Mobile Apps
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Challenges
Data Visibility Multiple consumers
and requirements
Multiple Access
Mechanisms
1990 2000 2010 2020
Analysts Applications
Data Scientists
Business Users API Access BI Tools
Notebooks
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence
Relational data
Schema defined prior to data load
TBs-PBs Scale
Operational reporting and ad hoc
Large initial capex + $10K–$50K/TB/Year
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
Schema defined during analysis
Scale storage and compute independently
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
Amazon S3
Many ways to bring all kinds of data
Unmatched durability and availability at EB scale
Best security, compliance, and audit capabilities
Integration with Big Data Tools
Run any analytics on the same data without movement
Cost effective - Store data at $0.023 / GB / Month
Redshift
EMR
Athena Kinesis
Elasticsearch Service
Amazon S3 as Data Lakes Storage Layer
Kinesis
Video Streams
AI Services
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Store
Simplified Big Data Pipeline
Amazon S3
Ingest
Process &
Analyze Consume
Data sources
Transactions
Web logs /
cookies
ERP
Connected
devices
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Lots of ingestion tools
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Process &
Analyze Consume
Store
Amazon S3
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Variety of data processing tools
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Consume
Amazon Athena
Amazon EMR
Amazon Redshift
& Spectrum
Amazon
Elasticsearch
Amazon AI/ML/DL
Services
Store
Amazon S3
Process & Analyze
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Multiple ways to consume the data
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Consume
Amazon Athena
Amazon EMR
Amazon Redshift
& Spectrum
Amazon
Elasticsearch
Amazon AI/ML/DL
Services
Store
Amazon S3
Process & Analyze
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
Amazon
QuickSight
Jupyter, Zeppelin,
HUE
Amazon API
Gateway
BI Tools
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Because data is NEVER perfect
Amazon EMR
Spark and Hive running on EMR
• Clean
• Transform
• Concatenate
• Convert to better formats
• Schedule transformations
• Event-driven transformations
• Transformations expressed as code
AWS Glue
Event based Server-less ETL engine
AWS Lambda
Trigger-based Code Execution
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
ETL when you need it
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Consume
Amazon Athena
Amazon EMR
Amazon Redshift
& Spectrum
Amazon
Elasticsearch
Amazon AI/ML/DL
Services
Store
Amazon S3
Process & Analyze
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
Amazon
QuickSight
Jupyter, Zeppelin,
HUE
Amazon API
Gateway
BI Tools
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Realtime - in-stream processing
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Consume
Amazon Athena
Amazon EMR
Amazon Redshift
& Spectrum
Amazon
Elasticsearch
Amazon AI/ML/DL
Services
Store
Amazon S3
Process & Analyze
Spark
Streaming
& Flink
Amazon
Kinesis
Analytics
In stream process
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
Amazon
QuickSight
Jupyter, Zeppelin,
HUE
Amazon API
Gateway
BI Tools
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue Data Catalog
One per account
Allows you to share metadata between Amazon Athena, Amazon Redshift
Spectrum, EMR & JDBC sources
Serverless
We added a few extensions:
§ Search over metadata for data discovery
§ Manage Connections – JDBC URLs, credentials
§ Classification for identifying and parsing files
§ Versioning of table metadata as schemas evolve and other
metadata are updated
Central Metadata
Catalog for the
data lake
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue Data Catalog Crawlers
Crawlers automatically build your Data Catalog and keep it in sync
Automatically discover new data, extracts schema definitions
• Detect schema changes and version tables
• Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom classifiers using Grok
expression
Run ad hoc or on a schedule; serverless – only pay when crawler runs
Catalogs Your
Data
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Write once, catalog once, read multiple, ETL Anywhere
IngestData sources
Transactions
Web logs /
cookies
ERP
Connected
devices
Consume
Amazon Athena
Amazon EMR
Amazon Redshift
& Spectrum
Amazon
Elasticsearch
Amazon AI/ML/DL
Services
Data Catalog
Store
Amazon S3
Process & Analyze
Amazon
QuickSight
Jupyter, Zeppelin,
HUE
Amazon API
Gateway
Amazon S3
API
Amazon Kinesis
Firehose
Direct Connect
Snowball
Database
Migration Service
Spark
Streaming
& Flink
Amazon
Kinesis
Analytics
In stream process
BI Tools
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Yair Weinberger
CTO and Co-Founder, Alooma
Data Pipeline as a Service
yair@alooma.com
CONFIDENTIAL
Architecture
100s of Data
Sources
Alooma’s Data Pipeline Data Destinations
Incoming
Queue
MAPPER
CODE
ENGINE
Restream Queue
Data Lake - The Goal
Emilykil (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Data Lake - What Sometimes Happens
NatalieMaynor from Jackson, Mississippi, USA - Winter Ugliness, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=5503067
Data Lake VS DataMart
Emilykil (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons, Abras2010 - WalmartUploaded by
SchuminWeb, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=14571617
Enjoy both worlds
EMR
Athena
Redshift
Spectrum
Redshift
S3
Alooma Usage of S3 as a Data Lake
● Separate between data of different tenants
○ IAM Role based access ensures data isolation
● Allow Alooma tenants to replay their data from any data
source or time
● Staging area before loading into Data Warehouse
● Storage for things that need infinite retention (e.g. audit logs)
Replay
Redsh
ift
S3
Files Structure in S3:
s3://bucket/[shard]/input_name/date/random_
me
Sharding is a tradeoff between throughput and
convenience
S3 as Data Lake - Tips and Tricks
● Use Server-Side encryption to provide automatic encryption at
rest - but it does impact performance
● Loading data in high volume
○ Keys in S3 are partitioned by prefix
○ Use Randomly prefixed or at least sharded filenames
● Use Object Expiration to avoid storing unnecessary data
Important resource:
https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-
perf-considerations.html
Questions / Feedback
yair@alooma.com
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Let’s take an Example
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Tweets Example
Record-level data
• What’s the overall sentiment today?
• What’s the sentiment trend now?
• What’s the most popular Language?
• What’s the Temp. affect on the tweet sentiment?
• Scale
• Resilience
• Minimal Operational overhead
• Agile
• Cost Effective
Business Questions
Technical Requirements
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
ConsumeStore Process & AnalyzeIngest
Kinesis Data Streams
Kinesis Firehose
Delivery Streams
DynamoDB
AWS Lambda
Kinesis
Analytics
Raw Bucket
Parquet Bucket
Athena Redshift
Spectrum
QuickSight
SpeedLayerBatchLayer
Glue Data
Catalog
Spark/EMR Glue ETL
Real time
Web UI
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank You!

Mais conteúdo relacionado

Mais procurados

Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Amazon Web Services Korea
 
Run Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage OptionsRun Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage Options
Amazon Web Services
 
What’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT ServicesWhat’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT Services
Amazon Web Services
 

Mais procurados (20)

Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWS
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
The Future of Enterprise IT
The Future of Enterprise IT The Future of Enterprise IT
The Future of Enterprise IT
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
Power up Your AWS Data Lake and Warehouse with Trusted Data (Sponsored by Tal...
Power up Your AWS Data Lake and Warehouse with Trusted Data (Sponsored by Tal...Power up Your AWS Data Lake and Warehouse with Trusted Data (Sponsored by Tal...
Power up Your AWS Data Lake and Warehouse with Trusted Data (Sponsored by Tal...
 
Run Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage OptionsRun Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage Options
 
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech TalksEnabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
 
Machine Learning for innovation and transformation
Machine Learning for innovation and transformationMachine Learning for innovation and transformation
Machine Learning for innovation and transformation
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
 
MassMutual Goes Cloud First with Hybrid Cloud on AWS (ENT210) - AWS re:Invent...
MassMutual Goes Cloud First with Hybrid Cloud on AWS (ENT210) - AWS re:Invent...MassMutual Goes Cloud First with Hybrid Cloud on AWS (ENT210) - AWS re:Invent...
MassMutual Goes Cloud First with Hybrid Cloud on AWS (ENT210) - AWS re:Invent...
 
What’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT ServicesWhat’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT Services
 
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
Dissecting Media Asset Management Architecture and Media Archive TCO (MAE301)...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobDatabases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
 
Accelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAccelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWS
 

Semelhante a Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018

Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
 

Semelhante a Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018 (20)

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Preparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SFPreparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SF
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Adir Sharabi Solutions Architect, Amazon Web Services Yair Weinberger CTO and Co-Founder, Alooma Big Data on AWS: To infinity and beyond!
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Big Data Track Agenda # Time Title Speaker 1 13:15 – 14:00 Big Data on AWS: To infinity and beyond! Adir Sharabi – Solutions Architect 2 14:10 – 14:55 Amazon Kinesis – Building Serverless real- time solution Roy Ben Alta – Business Development Manager 3 15:05 – 15:50 Data preparation and transformation: Spin your straw into gold Daniel Haviv – Specialist Solutions Architect, Analytics 4 16:00 – 16:45 Success has many query engines Eden Perry – Partner Solutions Architect 5 16:50 – 17:30 Connecting the dots: How Amazon Neptune and Graph Databases can transform your business • Andi Gutmans - GM, Neptune & Elasticsearch • Brad Bebee - Principal Prod Mgmt - Tech, AWS Neptune
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Documents and files Streams Your Data Sources Multiple sources and formats… and growing everyday Records Amazon RDS Amazon DynamoDB AWS IoT On Premises databases Spreadsheets Infrastructure logs Clickstream data Mobile app data Social media data Amazon Redshift Device data Sensor data ERP WEB Clickstream Mobile Apps
  • 4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Challenges Data Visibility Multiple consumers and requirements Multiple Access Mechanisms 1990 2000 2010 2020 Analysts Applications Data Scientists Business Users API Access BI Tools Notebooks
  • 5. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Traditionally, Analytics Used to Look Like This OLTP ERP CRM LOB Data Warehouse Business Intelligence Relational data Schema defined prior to data load TBs-PBs Scale Operational reporting and ad hoc Large initial capex + $10K–$50K/TB/Year
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data Schema defined during analysis Scale storage and compute independently Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 100110000100101011100 101010111001010100001 011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams Amazon S3 Many ways to bring all kinds of data Unmatched durability and availability at EB scale Best security, compliance, and audit capabilities Integration with Big Data Tools Run any analytics on the same data without movement Cost effective - Store data at $0.023 / GB / Month Redshift EMR Athena Kinesis Elasticsearch Service Amazon S3 as Data Lakes Storage Layer Kinesis Video Streams AI Services
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Store Simplified Big Data Pipeline Amazon S3 Ingest Process & Analyze Consume Data sources Transactions Web logs / cookies ERP Connected devices
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Lots of ingestion tools IngestData sources Transactions Web logs / cookies ERP Connected devices Process & Analyze Consume Store Amazon S3 Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Variety of data processing tools IngestData sources Transactions Web logs / cookies ERP Connected devices Consume Amazon Athena Amazon EMR Amazon Redshift & Spectrum Amazon Elasticsearch Amazon AI/ML/DL Services Store Amazon S3 Process & Analyze Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Multiple ways to consume the data IngestData sources Transactions Web logs / cookies ERP Connected devices Consume Amazon Athena Amazon EMR Amazon Redshift & Spectrum Amazon Elasticsearch Amazon AI/ML/DL Services Store Amazon S3 Process & Analyze Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service Amazon QuickSight Jupyter, Zeppelin, HUE Amazon API Gateway BI Tools
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Because data is NEVER perfect Amazon EMR Spark and Hive running on EMR • Clean • Transform • Concatenate • Convert to better formats • Schedule transformations • Event-driven transformations • Transformations expressed as code AWS Glue Event based Server-less ETL engine AWS Lambda Trigger-based Code Execution
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. ETL when you need it IngestData sources Transactions Web logs / cookies ERP Connected devices Consume Amazon Athena Amazon EMR Amazon Redshift & Spectrum Amazon Elasticsearch Amazon AI/ML/DL Services Store Amazon S3 Process & Analyze Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service Amazon QuickSight Jupyter, Zeppelin, HUE Amazon API Gateway BI Tools
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Realtime - in-stream processing IngestData sources Transactions Web logs / cookies ERP Connected devices Consume Amazon Athena Amazon EMR Amazon Redshift & Spectrum Amazon Elasticsearch Amazon AI/ML/DL Services Store Amazon S3 Process & Analyze Spark Streaming & Flink Amazon Kinesis Analytics In stream process Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service Amazon QuickSight Jupyter, Zeppelin, HUE Amazon API Gateway BI Tools
  • 15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue Data Catalog One per account Allows you to share metadata between Amazon Athena, Amazon Redshift Spectrum, EMR & JDBC sources Serverless We added a few extensions: § Search over metadata for data discovery § Manage Connections – JDBC URLs, credentials § Classification for identifying and parsing files § Versioning of table metadata as schemas evolve and other metadata are updated Central Metadata Catalog for the data lake
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue Data Catalog Crawlers Crawlers automatically build your Data Catalog and keep it in sync Automatically discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expression Run ad hoc or on a schedule; serverless – only pay when crawler runs Catalogs Your Data
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Write once, catalog once, read multiple, ETL Anywhere IngestData sources Transactions Web logs / cookies ERP Connected devices Consume Amazon Athena Amazon EMR Amazon Redshift & Spectrum Amazon Elasticsearch Amazon AI/ML/DL Services Data Catalog Store Amazon S3 Process & Analyze Amazon QuickSight Jupyter, Zeppelin, HUE Amazon API Gateway Amazon S3 API Amazon Kinesis Firehose Direct Connect Snowball Database Migration Service Spark Streaming & Flink Amazon Kinesis Analytics In stream process BI Tools
  • 18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Yair Weinberger CTO and Co-Founder, Alooma
  • 19. Data Pipeline as a Service yair@alooma.com
  • 20. CONFIDENTIAL Architecture 100s of Data Sources Alooma’s Data Pipeline Data Destinations Incoming Queue MAPPER CODE ENGINE Restream Queue
  • 21. Data Lake - The Goal Emilykil (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
  • 22. Data Lake - What Sometimes Happens NatalieMaynor from Jackson, Mississippi, USA - Winter Ugliness, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=5503067
  • 23. Data Lake VS DataMart Emilykil (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons, Abras2010 - WalmartUploaded by SchuminWeb, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=14571617
  • 25. Alooma Usage of S3 as a Data Lake ● Separate between data of different tenants ○ IAM Role based access ensures data isolation ● Allow Alooma tenants to replay their data from any data source or time ● Staging area before loading into Data Warehouse ● Storage for things that need infinite retention (e.g. audit logs)
  • 26. Replay Redsh ift S3 Files Structure in S3: s3://bucket/[shard]/input_name/date/random_ me Sharding is a tradeoff between throughput and convenience
  • 27. S3 as Data Lake - Tips and Tricks ● Use Server-Side encryption to provide automatic encryption at rest - but it does impact performance ● Loading data in high volume ○ Keys in S3 are partitioned by prefix ○ Use Randomly prefixed or at least sharded filenames ● Use Object Expiration to avoid storing unnecessary data Important resource: https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate- perf-considerations.html
  • 29. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Let’s take an Example
  • 30. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Tweets Example Record-level data • What’s the overall sentiment today? • What’s the sentiment trend now? • What’s the most popular Language? • What’s the Temp. affect on the tweet sentiment? • Scale • Resilience • Minimal Operational overhead • Agile • Cost Effective Business Questions Technical Requirements
  • 31. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. ConsumeStore Process & AnalyzeIngest Kinesis Data Streams Kinesis Firehose Delivery Streams DynamoDB AWS Lambda Kinesis Analytics Raw Bucket Parquet Bucket Athena Redshift Spectrum QuickSight SpeedLayerBatchLayer Glue Data Catalog Spark/EMR Glue ETL Real time Web UI
  • 32. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank You!