Lambda architecture for large data volumes.

•

1 gostou•245 visualizações

Realtime and Batch analysis over the same Data with Lambda Architecture. Realtime e Batch, análises utilizando arquitetura Lambda.

Dados e análise

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hugo Rozestraten
Specialist Solutions Architect
26 Set 2017
Arquitetura Lambda para alto
volume de dados e rápida resposta
ao Negócio.

Segmentação de Clientes
Otimização de Marketing
Modelagem e previsão Financeira
Ad targeting & real time bidding
Análise Clickstream
Detecção de Fraude
Redução de Custos operacionais
Casos de Uso

Visitas, views, clicks, compras
Fonte, device, local, hora
Latência, throughput, uptime
Likes, shares, friends, follows
Preço, frequência
Métricas

Tudo começa com Dados…
VOLUME VELOCIDADE VARIEDADEDesafio
Necessidades
x
Possibilidades
BATCH
Relatórios
REAL-TIME
Alertas
Predição
Previsão

Tendência: Aplicações inteligentes
Baseado no que
conhece de seu
”user”:
Ele vai usar seu
produto?
Baseado no seu
conhecimento sobre
pedidos:
Este pedido é
fraudulento?
Baseado no que conhece
sobre Notícias:
Quais outros artigos
seriam interessantes?

Fraud Detection
FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion
trading events per day and securely store over 5 petabytes of data,
attaining savings of $10-20mm per year.

Dados Desacoplados “data bus”
• Data → Store → Process → Store → Analyze → Answers
Ferramenta certa para cada necessidade
• Estrutura, latência, capacidade, padrões de acesso
Use arquiteturas Lambda
• Imutáveis (append-only) log, camada batch/speed/serving
Uso de Serviços Gerenciados
• Escalabilidade/Elasticidade, disponibilidade, confiabilidade,
segurança, no/low admin
Big data ≠ big cost
Princípios de Arquitetura – Big Data

Modelo de Processamento Simplificado
Coleta Armazenamento Processamento/
Análise
Consumo
Tempo de Resposta (Latency)?
Capacidade?
Custo?

Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
AWS Lambda
Apache Storm on
EMR
Apache Flink
on EMR
Spark Streaming
on EMR
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL DB & Graph DB
Amazon
Elasticsearch Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Machine Learning
Open Source
Tool of Choice
on EC2
FontesdeDadosArquitetura Lambda
AWS
Data Science Sandbox
Visualization /
Reporting
Amazon Kinesis
Analytics
Athena

Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Arquitetura de Big Data
Insights to enhance business applications, new digital services
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Data analysts
Data scientists
Business users
Engagement platforms
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Automation / events
Amazon S3
Staged Data
(Data Lake)
Semi/Unstructured
Amazon EMR
Transactions
Web logs /
cookies
ERP
AWS Database
Migration
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Connected
devices
Social media
Amazon S3
Raw Data
Amazon EMR
ETL
Advanced
Analytics
MLlib
Event Capture
Amazon Kinesis
Stream Analysis
Amazon EMR
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS

RedShift
Armazenamento
S3
Machine
Learning
EMR
Motor de
Regras
AWS IoT
Plataforma de IoT da AWS
Machine
Learning
Amazon Athena
ElasticSearch
Kinesis
Firehose
Análise de
Dados
Amazon
Quicksight
Kibana
Streaming
CloudWatch
Visualização

Demo – Fábrica de peças Metais
Matéria
Prima Forno
Produção
Descarte
Prensa
Usinagem
Qualidade Temp 2
Temp 1 Temp 3
Temp 4 Vibr.1
Vibr.2
Peso

Arquitetura proposta
AWS
5 a 15 minutos - batch
Realtime
Data
Catalog
Analítico
Ingestão
Ingestão
Ingestão

Mais conteúdo relacionado

Semelhante a Lambda architecture for large data volumes.

Construindo Data Lakes e Analytics na AWS - BDA301 - Sao Paulo SummitAmazon Web Services

Hadoop, Big Data e Cloud ComputingAmazon Web Services LATAM

Sessão Avançada: Construindo um Data Warehouse Moderno com Amazon Redshift - ...Amazon Web Services

Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM

AWS Summit SP 2016: Desvendando Seu Dataset Com Amazon Machine LearningRayssa Küllian

Desvendando seus dados com Amazon Machine LearningAmazon Web Services LATAM

AIML Reforçando a segurança virtualAmazon Web Services LATAM

Microsoft Azure: Fundação para Transformação DigitalRichard Chaves

Webinar Data Lakes & Analytics na AWSAmazon Web Services LATAM

Luis gregorio big dataiseltech

AWS Innovate 2020 - Entenda como o Data Flywheel pode apoiá-lo em sua estraté...Amazon Web Services LATAM

Iniciando com AWS Mobile servicesAmazon Web Services LATAM

Construindo seu Data Lake na AWSAmazon Web Services LATAM

Keynote AWS RoadShow Porto Alegre 2013Amazon Web Services LATAM

Keynote - Sao Paulo Summit - 2015 - Teresa CarlsonAmazon Web Services LATAM

Aplicando uma Estratégia de Banco de Dados AWS Personalizada: Encontre o Banc...Amazon Web Services

AWS Initiate - AWS & IoT (Internet das Coisas) - Smart CitiesAmazon Web Services LATAM

AWS Data Immersion Webinar Week - Planeje e entenda como criar um repositório...Amazon Web Services LATAM

Construindo Data Lakes e Analytics na AWSAmazon Web Services LATAM

Processamento Dados em Escala com Serverless: Um Estudo de Caso da Amazon.com...Amazon Web Services

Semelhante a Lambda architecture for large data volumes. (20)

Construindo Data Lakes e Analytics na AWS - BDA301 - Sao Paulo Summit

Hadoop, Big Data e Cloud Computing

Sessão Avançada: Construindo um Data Warehouse Moderno com Amazon Redshift - ...

Os benefícios de migrar seus workloads de Big Data para a AWS

AWS Summit SP 2016: Desvendando Seu Dataset Com Amazon Machine Learning

Desvendando seus dados com Amazon Machine Learning

AIML Reforçando a segurança virtual

Microsoft Azure: Fundação para Transformação Digital

Webinar Data Lakes & Analytics na AWS

Luis gregorio big data

AWS Innovate 2020 - Entenda como o Data Flywheel pode apoiá-lo em sua estraté...

Iniciando com AWS Mobile services

Construindo seu Data Lake na AWS

Keynote AWS RoadShow Porto Alegre 2013

Keynote - Sao Paulo Summit - 2015 - Teresa Carlson

Aplicando uma Estratégia de Banco de Dados AWS Personalizada: Encontre o Banc...

AWS Initiate - AWS & IoT (Internet das Coisas) - Smart Cities

AWS Data Immersion Webinar Week - Planeje e entenda como criar um repositório...

Construindo Data Lakes e Analytics na AWS

Processamento Dados em Escala com Serverless: Um Estudo de Caso da Amazon.com...

Lambda architecture for large data volumes.

1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hugo Rozestraten Specialist Solutions Architect 26 Set 2017 Arquitetura Lambda para alto volume de dados e rápida resposta ao Negócio.

2. Segmentação de Clientes Otimização de Marketing Modelagem e previsão Financeira Ad targeting & real time bidding Análise Clickstream Detecção de Fraude Redução de Custos operacionais Casos de Uso

3. Visitas, views, clicks, compras Fonte, device, local, hora Latência, throughput, uptime Likes, shares, friends, follows Preço, frequência Métricas

4. Tudo começa com Dados… VOLUME VELOCIDADE VARIEDADEDesafio Necessidades x Possibilidades BATCH Relatórios REAL-TIME Alertas Predição Previsão

5. Tendência: Aplicações inteligentes Baseado no que conhece de seu ”user”: Ele vai usar seu produto? Baseado no seu conhecimento sobre pedidos: Este pedido é fraudulento? Baseado no que conhece sobre Notícias: Quais outros artigos seriam interessantes?

7. Fraud Detection FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion trading events per day and securely store over 5 petabytes of data, attaining savings of $10-20mm per year.

8. Dados Desacoplados “data bus” • Data → Store → Process → Store → Analyze → Answers Ferramenta certa para cada necessidade • Estrutura, latência, capacidade, padrões de acesso Use arquiteturas Lambda • Imutáveis (append-only) log, camada batch/speed/serving Uso de Serviços Gerenciados • Escalabilidade/Elasticidade, disponibilidade, confiabilidade, segurança, no/low admin Big data ≠ big cost Princípios de Arquitetura – Big Data

9. Modelo de Processamento Simplificado Coleta Armazenamento Processamento/ Análise Consumo Tempo de Resposta (Latency)? Capacidade? Custo?

10. Amazon S3 Data Lake Amazon Kinesis Streams & Firehose AWS Lambda Apache Storm on EMR Apache Flink on EMR Spark Streaming on EMR Hadoop / Spark Streaming Analytics Tools Amazon Redshift Data Warehouse Amazon DynamoDB NoSQL DB & Graph DB Amazon Elasticsearch Service Relational Database Amazon EMR Amazon Aurora Amazon Machine Learning Machine Learning Open Source Tool of Choice on EC2 FontesdeDadosArquitetura Lambda AWS Data Science Sandbox Visualization / Reporting Amazon Kinesis Analytics Athena

11. Speed (Real-time) Ingest ServingData sources Scale (Batch) Arquitetura de Big Data Insights to enhance business applications, new digital services Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Data analysts Data scientists Business users Engagement platforms Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Automation / events Amazon S3 Staged Data (Data Lake) Semi/Unstructured Amazon EMR Transactions Web logs / cookies ERP AWS Database Migration AWS Direct Connect Internet Interfaces Amazon Kinesis Connected devices Social media Amazon S3 Raw Data Amazon EMR ETL Advanced Analytics MLlib Event Capture Amazon Kinesis Stream Analysis Amazon EMR AWS Cloud Trail AWS IAM Amazon CloudWatch AWS KMS

12. RedShift Armazenamento S3 Machine Learning EMR Motor de Regras AWS IoT Plataforma de IoT da AWS Machine Learning Amazon Athena ElasticSearch Kinesis Firehose Análise de Dados Amazon Quicksight Kibana Streaming CloudWatch Visualização

13. Demo – Fábrica de peças Metais Matéria Prima Forno Produção Descarte Prensa Usinagem Qualidade Temp 2 Temp 1 Temp 3 Temp 4 Vibr.1 Vibr.2 Peso

14. Arquitetura proposta AWS 5 a 15 minutos - batch Realtime Data Catalog Analítico Ingestão Ingestão Ingestão

15. Demo

16. Obrigado!!!

Notas do Editor

Here are a few big data use cases
Which require a lot of metrics such as…
3 main charectiristic of DATA Velocity Moves at very high rates Valuable in its temporal, high velocity state Volume Fast-moving data creates massive historical archives Valuable for mining patterns, trends and relationships Variety Structured (logs, business transactions) Semi-structured and unstructured
Condense these slides into 3-box
Go faster – only key points Before we go into solving the Big architecture, I want to introduce some “tried and test” architecture principles. Here at AWS we believe you should be using the right tool for the job – “instead of using a big swiss army knife for using a screw dreive, it will be best to use a screw drive - this is especially important for big data architectures. We’ll talk about this more. Decoupled architecture http://whatis.techtarget.com/definition/decoupled-architecture - In general, a decoupled architecture is a framework for complex work that allows components to remain completely autonomous and unaware of each other…this has been tried and battle test. Managed services – this is relatively now - Should I install Cassandra or MongoDB or CouchDB on AWS. You obviously can. Sometimes there are good reasons for doing this. Many customers still do this. Netflix is a great example. They run a multi-region Cassandra and are a poster child for how to do this. But for most customers, delegating this task to AWS makes more sense….you are better of spending your time on building features for your customers rather than building highly scalable distributed systems. Lambda Architecture -
throughput = f (volume, request rate) latency Cost Event to action/answer latency?
http://calculator.s3.amazonaws.com/index.html#r=IAD&key=calc-BE3BA3E4-1AC5-4E7A-B542-015056D8EDAF Kinesis -> $52.14 per month SQS -> $133.42 per month for puts or $400/month (put, get, delete) DynamoDB -> $3809.88 per month (10TB of storage cost itself is $2500/month) Cost (100rpsx 35KB) $52/month $133/month * 2 = $266/month ? Amazon DynamoDB Service (US-East) $ Provisioned Throughput Capacity: $120 Indexed Data Storage: $2560.90 DynamoDB Streams: $1.3 Amazon SQS Service (US-East) Pricing Example Let’s assume that our data producers put 100 records per second in aggregate, and each record is 35KB. In this case, the total data input rate is 3.4MB/sec (100 records/sec*35KB/record). For simplicity, we assume that the throughput and data size of each record are stable and constant throughout the day. Please note that we can dynamically adjust the throughput of our Amazon Kinesis stream at any time. We first calculate the number of shards needed for our stream to achieve the required throughput. As one shard provides a capacity of 1MB/sec data input and supports 1000 records/sec, four shards provide a capacity of 4MB/sec data input and support 4000 records/sec. So a stream with four shards satisfies our required throughput of 3.4MB/sec at 100 records/sec. We then calculate our monthly Amazon Kinesis costs using Amazon Kinesis pricing in the US-East Region: Shard Hour: One shard costs $0.015 per hour, or $0.36 per day ($0.015*24). Our stream has four shards so that it costs $1.44 per day ($0.36*4). For a month with 31 days, our monthly Shard Hour cost is $44.64 ($1.44*31). PUT Payload Unit (25KB): As our record is 35KB, each record contains two PUT Payload Units. Our data producers put 100 records or 200 PUT Payload Units per second in aggregate. That is 267,840,000 records or 535,680,000 PUT Payload Units per month. As one million PUT Payload Units cost $0.014, our monthly PUT Payload Units cost is $7.499 ($0.014*535.68). Adding the Shard Hour and PUT Payload Unit costs together, our total Amazon Kinesis costs are $1.68 per day, or $52.14 per month. For $1.68 per day, we have a fully-managed streaming data infrastructure that enables us to continuously ingest 4MB of data per second, or 337GB of data per day in a reliable and elastic manner.
Amazon Elasticsearch service allows you to easily and securely deploy and scale an ELK stack in minutes. Integration with Logstash is tightly coupled and a Kibana instance is automatically configured for you. The service automatically detects and replaces failed Elasticsearch nodes, reducing the overhead associated with self-managed infrastructure and Elasticsearch software. Ideal Usage Patterns análise logs análise data stream updates from other AWS services Provide customers a rich search and navigation experience Usage monitoring for mobile applications Performance Depends on multiple factors including instance type, workload, index, number of shards used, read replicas Storage configurations –instance storage or EBS storage Cost Model Pay as you go Only pay for compute and storage

Lambda architecture for large data volumes.

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Lambda architecture for large data volumes.

Semelhante a Lambda architecture for large data volumes. (20)

Lambda architecture for large data volumes.

Notas do Editor