O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Russell Nash – AWS Solutions Architect, AWS
Buil...
SCALABLE FLEXIBLE MANAGEABLE
COST
EFFECTIVE
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Database Analytics Flat File
Processing
Real-time
Pipeline
Data Lake
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
MPP SQL Database
Optimised for Analytics
Gigabytes to Petabytes
Fully relational
Amazon
Redshift
ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 ...
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
SQL
SQL SQL SQLResults Results Results
...
160 GB
2 PB
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
AWS Database
Migration Service
Amazon
RedshiftSource
Database
ETL
Data Integration
Partners
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
https://aws.amazon.com/solutions/case-studies/boingo-wireless/
Database Analytics Flat File
Processing
Real-time
Pipeline
Data Lake
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Amazon
S3
Object Storage
Low Cost
Highly Scalable
11 9’s of durability
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
In pioneer days, they used oxen for heavy pulling,
and when one ox couldn’t budge a log,
they didn’t try to grow a bigger ...
PIG
Infrastructure
Data Layer
Process Layer
Framework
Applications
PIG
SQL
Infrastructure
Data Layer
Process Layer
Framework
Applications
PIG
SQL
Amazon
EMR
PIG
SQL
Amazon
EMR
Amazon
S3
EMRFS
Amazon
EMR
• Managed Hadoop
• Optimized with S3
• Open Source Support
Compute Flexibility
Compute Memory Storage
Machine Learning
C4 Family
C3 Family
X1 Family
R3 Family
Interactive Analysis
D...
Cost & Time
# CPUs
Time
# CPUs
Time
Wall clock time: 1 hourWall clock time: 10 hours
Spot Price – M3.2XL
On-Demand Spot-Price
$0.08$0.75
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
AWS
Glue
• Managed Transform Engine
• Job Scheduler
• Data Catalog
• Built on Apache Spark
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
PIG
SQL
Amazon
EMR
Amazon
S3
EMRFS
R
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Amazon
Athena
Query S3 data with SQL
Serverless
Instant Spin-Up
Pay per Query
Athena
S3
Comparison of SQL Processing engines
Amazon
Redshift
Amazon
Athena
Data Structure
Languages
Semi Semi
SQL, HiveQL SQL
Full...
Comparison of SQL Processing engines
Transformation
SQL Queries
For S3/HDFS
Fully Featured
SQL
Database
Use Case
Amazon
Re...
https://aws.amazon.com/solutions/case-studies/finra/
Database Analytics Flat File
Processing
Real-time
Pipeline
Data Lake
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Availability
Zone
Availability
Zone
Availability
Zone
Amazon Kinesis
Stream
AWS Lambda
KCL App
Amazon EMR
Streaming
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Amazon
Elasticsearch
• Search and Analytics
• Scalable
• Fully Managed
• Integrated – Logstash, Kibana
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
StreamAlert
https://medium.com/airbnb-engineering/streamalert-real-time-data-analysis-and-alerting-e8619e3e5043
Database Analytics Flat File
Processing
Real-time
Pipeline
Data Lake
Any data Any analysisData Lake
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automatio...
Now available in the Mumbai region!
Amazon
Redshift
Amazon
EMR
Amazon
Kinesis
Amazon
Elasticsearch
Select Customers
The vast majority of Big Data use cases deployed in the cloud today run onAWS
New X1 Instance - Tons of Memory
• Large-scale, in-memory applications
• Intel® Xeon® E7 8880 v3 Haswell processors
• Up t...
Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as li...
REGISTER NOW
http://amzn.to/2jFt11N
Complimentary labs are available only till 31 March 2017
Get hands on experience worki...
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Terminou este documento.
Próximos SlideShares
How to Prepare for AWS Certification and Advance your Career - February 2017 AWS Online Tech Talks
Avançar
Próximos SlideShares
How to Prepare for AWS Certification and Advance your Career - February 2017 AWS Online Tech Talks
Avançar

Compartilhar

Building A Modern Data Analytics Architecture on AWS

Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture.

We will cover:

Using different SQL engines to analyze large amounts of structured data
Analysing streaming data in near-real time
Architectures for batch processing
Best practices for Data Lake architectures

This session is suited for:

Solution and enterprise architects
Data architects/ Data warehouse owners
IT & Innovation team members

Building A Modern Data Analytics Architecture on AWS

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Russell Nash – AWS Solutions Architect, AWS Building A Modern Data Analytics Architecture on AWS In partnership with:
  2. 2. SCALABLE FLEXIBLE MANAGEABLE COST EFFECTIVE
  3. 3. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Modern Data Architecture AWS Cloud Trail AWS IAM Amazon CloudWatch AWS KMS
  4. 4. Database Analytics Flat File Processing Real-time Pipeline Data Lake
  5. 5. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Database Analytics Amazon Redshift Source Database
  6. 6. MPP SQL Database Optimised for Analytics Gigabytes to Petabytes Fully relational Amazon Redshift
  7. 7. ID Name 1 John Smith 2 Jane Jones 3 Peter Black 4 Pat Partridge 5 Sarah Cyan 6 Brian Snail 1 John Smith 4 Pat Partridge 2 Jane Jones 5 Sarah Cyan 3 Peter Black 6 Brian Snail
  8. 8. 1 John Smith 4 Pat Partridge 2 Jane Jones 5 Sarah Cyan 3 Peter Black 6 Brian Snail SQL SQL SQL SQLResults Results Results Results
  9. 9. 160 GB 2 PB
  10. 10. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources ETL Amazon Redshift Source Database Database Analytics
  11. 11. AWS Database Migration Service Amazon RedshiftSource Database ETL Data Integration Partners
  12. 12. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources ELT Amazon Redshift Amazon Redshift Source Database Database Analytics
  13. 13. https://aws.amazon.com/solutions/case-studies/boingo-wireless/
  14. 14. Database Analytics Flat File Processing Real-time Pipeline Data Lake
  15. 15. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Batch Processing Flat Files Amazon S3
  16. 16. Amazon S3 Object Storage Low Cost Highly Scalable 11 9’s of durability
  17. 17. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Flat Files Amazon S3 Batch Processing AWS Snowball AWS CLI & SDK
  18. 18. In pioneer days, they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a bigger ox. Grace Hopper
  19. 19. PIG Infrastructure Data Layer Process Layer Framework Applications
  20. 20. PIG SQL Infrastructure Data Layer Process Layer Framework Applications
  21. 21. PIG SQL Amazon EMR
  22. 22. PIG SQL Amazon EMR Amazon S3 EMRFS
  23. 23. Amazon EMR • Managed Hadoop • Optimized with S3 • Open Source Support
  24. 24. Compute Flexibility Compute Memory Storage Machine Learning C4 Family C3 Family X1 Family R3 Family Interactive Analysis D2 Family I2 Family Large HDFS General Batch Process M4 Family M3 Family
  25. 25. Cost & Time # CPUs Time # CPUs Time Wall clock time: 1 hourWall clock time: 10 hours
  26. 26. Spot Price – M3.2XL On-Demand Spot-Price $0.08$0.75
  27. 27. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Flat Files Amazon S3 Batch Processing Amazon EMR Amazon S3 AWS Glue AWS Snowball AWS CLI & SDK
  28. 28. AWS Glue • Managed Transform Engine • Job Scheduler • Data Catalog • Built on Apache Spark
  29. 29. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Flat Files Amazon S3 Batch Processing Amazon EMR Amazon S3 AWS Glue Amazon Redshift Amazon EMR AWS Snowball AWS CLI & SDK
  30. 30. PIG SQL Amazon EMR Amazon S3 EMRFS R
  31. 31. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Flat Files Amazon S3 Batch Processing Amazon EMR Amazon S3 AWS Glue Amazon Redshift Amazon EMR Amazon AthenaAWS Snowball AWS CLI & SDK
  32. 32. Amazon Athena Query S3 data with SQL Serverless Instant Spin-Up Pay per Query
  33. 33. Athena S3
  34. 34. Comparison of SQL Processing engines Amazon Redshift Amazon Athena Data Structure Languages Semi Semi SQL, HiveQL SQL Full SQL Data Store S3/HDFS S3 Local SQL Semi SQL S3/HDFS Performance
  35. 35. Comparison of SQL Processing engines Transformation SQL Queries For S3/HDFS Fully Featured SQL Database Use Case Amazon RedshiftAmazon Athena SQL Serverless SQL Queries for S3
  36. 36. https://aws.amazon.com/solutions/case-studies/finra/
  37. 37. Database Analytics Flat File Processing Real-time Pipeline Data Lake
  38. 38. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Real-time Pipeline Amazon Kinesis Machines Devices Mobile Clickstream
  39. 39. Availability Zone Availability Zone Availability Zone Amazon Kinesis Stream AWS Lambda KCL App Amazon EMR Streaming
  40. 40. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Amazon Kinesis AWS Lambda Application Amazon EMR Streaming S3 (Log) Amazon ElasticSearch (Dashboard) Real-time Pipeline
  41. 41. Amazon Elasticsearch • Search and Analytics • Scalable • Fully Managed • Integrated – Logstash, Kibana
  42. 42. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Amazon Kinesis AWS Lambda Application Amazon EMR Streaming S3 (Logs) Amazon ElasticSearch (Dashboards) Amazon EMR (Predictions) ML Amazon SNS (Alerts) Real-time Pipeline Amazon Redshift (Analytics)
  43. 43. StreamAlert https://medium.com/airbnb-engineering/streamalert-real-time-data-analysis-and-alerting-e8619e3e5043
  44. 44. Database Analytics Flat File Processing Real-time Pipeline Data Lake
  45. 45. Any data Any analysisData Lake
  46. 46. Ingest Serving Speed (Real-time) Scale (Batch) Data analysts Data scientists Business users Engagement platforms Automation / events Sources Amazon Kinesis AWS Lambda Application Amazon EMR Streaming Amazon EMR Data Lake Amazon Redshift ETL Amazon Athena EC2 AWS CLI & SDK Amazon S3 Amazon EMR Amazon S3 AWS Cloud Trail AWS IAM Amazon CloudWatch AWS KMS
  47. 47. Now available in the Mumbai region! Amazon Redshift Amazon EMR Amazon Kinesis Amazon Elasticsearch
  48. 48. Select Customers The vast majority of Big Data use cases deployed in the cloud today run onAWS
  49. 49. New X1 Instance - Tons of Memory • Large-scale, in-memory applications • Intel® Xeon® E7 8880 v3 Haswell processors • Up to 2TB of memory • Up to 128 vCPUs per instance
  50. 50. Intel® Processor Technologies Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance
  51. 51. REGISTER NOW http://amzn.to/2jFt11N Complimentary labs are available only till 31 March 2017 Get hands on experience working with the AWS Technology. Access the complimentary Big Data on AWS self-paced labs
  • ssuser3e70ba

    Aug. 29, 2021
  • durgeshkotwal

    Oct. 16, 2020
  • KyuhwanYun

    Jun. 22, 2020
  • ksmin23

    Mar. 10, 2020
  • AlbertVo

    Jan. 26, 2020
  • ftanbirsohail

    Sep. 22, 2019
  • LakshmanaKattula

    Jun. 17, 2019
  • ReinaldoColina

    May. 1, 2019
  • DoniSuhartono1

    Apr. 2, 2019
  • ssuser15bb33

    Jan. 30, 2019
  • UmapathyV

    Jan. 14, 2019
  • UnseenChaos

    Dec. 19, 2018
  • devosp

    Nov. 16, 2018
  • bhnraju

    Jul. 18, 2018
  • opilot

    Jul. 13, 2018
  • idpt0000

    May. 1, 2018
  • AndreaFoti

    Mar. 12, 2018
  • rcnavas

    Dec. 8, 2017
  • RobertHarris35

    Nov. 10, 2017
  • MaoliChang1

    Oct. 4, 2017

Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data. In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture. We will cover: Using different SQL engines to analyze large amounts of structured data Analysing streaming data in near-real time Architectures for batch processing Best practices for Data Lake architectures This session is suited for: Solution and enterprise architects Data architects/ Data warehouse owners IT & Innovation team members

Vistos

Vistos totais

6.037

No Slideshare

0

De incorporações

0

Número de incorporações

3

Ações

Baixados

0

Compartilhados

0

Comentários

0

Curtir

34

×