SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Distributed Data Systems
How Do They Even?
About Me - Jared L Kerim
- Software Developer (Python)
- Mozilla Geolocation Cloud Services Team
- CTO at PressureNET
PressureNET (Shameless Plug)
- Gathers sensor data from
smartphones
- Constant stream of data to
servers
- API to retrieve data
- Visualization
- Analysis
The First Architecture
Sensors Web Servers MySQL API
The Problem: MySQL
- Slow lookups
- Takes a lot of disk space
- Cost (Large Relational DBs are expensive)
- Schema changes (become slow or impossible)
How Big is “Big”
- PressureNET 100 req/s, 1.5 billion records
- Analytics Systems 5000 req/s, 100s of
billions of records
- Ad Buying Service 500k req/s, trillions of
records
The Question
What is ????
Sensors ???? APIWeb Servers
What do we want to accomplish?
- Receive and store large amounts of data
- Access it quickly
- Small fast lookups (visualization)
- Large batch computations (mapreduce)
Considerations
- Durability (we don’t want to lose data)
- Redundancy (expect failures!)
- Scalability (simple growth, no upper limit)
Durability
- Data in a durable store should be ‘safe’
- Don’t remove data from one durable data
store until it is confirmed to be in another
durable data store
- Durable data stores should have redundant
backups (hot standbys)
Redundancy
- Each stage of your system should have
multiple copies
- If one copy goes down, another should take
over
- Redundancy ensures availability
Scalability
- The rate of data intake can grow or spike
- Your system should be able to add more
resources to handle that growth
- Require that your workload is partitionable
Proposed Architecture
Sensors Ingestors Queue Aggregator
S3
DynamoDB
We Are Not Alone
- This architecture is widely adopted
- Analytics
- Ad Serving/Views
- Log Analysis
- Sensor Data
- Game Events
- Video Events
Ingestors
- A redundant, scalable set of nodes which
receive data over http
- Can apply early validation and
authentication
- Stateless, low latency
Queue
- A scalable, durable storage mechanism for
data ‘in flight’
- Only holds data temporarily
- Typically preserves the order data was
received in
Aggregator
- A scalable, stateless set of workers which
consume data from the queue
- Can process data in small batches
- Write raw or transformed data to persistent
storage such as S3, Databases, etc.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Data streaming at VRT
Data streaming at VRTData streaming at VRT
Data streaming at VRT
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & Cassandra
 
RealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache ApexRealTime AdTech reporting & targeting with Apache Apex
RealTime AdTech reporting & targeting with Apache Apex
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big data
 
Cap server log file analytics
Cap server log file analyticsCap server log file analytics
Cap server log file analytics
 
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
 
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source AnalysisThe New Basics of Business Intelligence Lesson 3: Multi Source Analysis
The New Basics of Business Intelligence Lesson 3: Multi Source Analysis
 

Destaque (14)

Evaluation Q2 dorcas
Evaluation Q2  dorcasEvaluation Q2  dorcas
Evaluation Q2 dorcas
 
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
CURRICULUM VITAE Ronel Hirsch 19 Maart 2015
 
Aan de slag met dropbox
Aan de slag met dropboxAan de slag met dropbox
Aan de slag met dropbox
 
Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2Commercial Real Estate Services BldgV2
Commercial Real Estate Services BldgV2
 
Búsquedas por palabras claves
Búsquedas por palabras clavesBúsquedas por palabras claves
Búsquedas por palabras claves
 
Magazine analysis
Magazine analysisMagazine analysis
Magazine analysis
 
Il nucleo
Il nucleoIl nucleo
Il nucleo
 
Rbi
RbiRbi
Rbi
 
Презентация Give5 Club
Презентация Give5 ClubПрезентация Give5 Club
Презентация Give5 Club
 
Seconday day
Seconday day Seconday day
Seconday day
 
Compos first period✧
Compos first period✧Compos first period✧
Compos first period✧
 
My last vacation
My last vacationMy last vacation
My last vacation
 
Biografi Elvira Devinamira
Biografi Elvira DevinamiraBiografi Elvira Devinamira
Biografi Elvira Devinamira
 
Landress garage build
Landress garage buildLandress garage build
Landress garage build
 

Semelhante a Distributed Data Systems

Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
VMware Tanzu
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 

Semelhante a Distributed Data Systems (20)

Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data Analytics
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Lecture1
Lecture1Lecture1
Lecture1
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Distributed Data Systems

  • 2. About Me - Jared L Kerim - Software Developer (Python) - Mozilla Geolocation Cloud Services Team - CTO at PressureNET
  • 3. PressureNET (Shameless Plug) - Gathers sensor data from smartphones - Constant stream of data to servers - API to retrieve data - Visualization - Analysis
  • 4. The First Architecture Sensors Web Servers MySQL API
  • 5. The Problem: MySQL - Slow lookups - Takes a lot of disk space - Cost (Large Relational DBs are expensive) - Schema changes (become slow or impossible)
  • 6. How Big is “Big” - PressureNET 100 req/s, 1.5 billion records - Analytics Systems 5000 req/s, 100s of billions of records - Ad Buying Service 500k req/s, trillions of records
  • 7. The Question What is ???? Sensors ???? APIWeb Servers
  • 8. What do we want to accomplish? - Receive and store large amounts of data - Access it quickly - Small fast lookups (visualization) - Large batch computations (mapreduce)
  • 9. Considerations - Durability (we don’t want to lose data) - Redundancy (expect failures!) - Scalability (simple growth, no upper limit)
  • 10. Durability - Data in a durable store should be ‘safe’ - Don’t remove data from one durable data store until it is confirmed to be in another durable data store - Durable data stores should have redundant backups (hot standbys)
  • 11. Redundancy - Each stage of your system should have multiple copies - If one copy goes down, another should take over - Redundancy ensures availability
  • 12. Scalability - The rate of data intake can grow or spike - Your system should be able to add more resources to handle that growth - Require that your workload is partitionable
  • 13. Proposed Architecture Sensors Ingestors Queue Aggregator S3 DynamoDB
  • 14. We Are Not Alone - This architecture is widely adopted - Analytics - Ad Serving/Views - Log Analysis - Sensor Data - Game Events - Video Events
  • 15. Ingestors - A redundant, scalable set of nodes which receive data over http - Can apply early validation and authentication - Stateless, low latency
  • 16. Queue - A scalable, durable storage mechanism for data ‘in flight’ - Only holds data temporarily - Typically preserves the order data was received in
  • 17. Aggregator - A scalable, stateless set of workers which consume data from the queue - Can process data in small batches - Write raw or transformed data to persistent storage such as S3, Databases, etc.