SlideShare uma empresa Scribd logo
1 de 21
© ALTOROS Systems | CONFIDENTIAL
Andrei Yurkevich
Chief Technology Officer
andrei.yurkevich@altoros.com
© ALTOROS Systems | CONFIDENTIAL 2
• Hadoop/NoSQL performance engineering
• Cluster Automation & Server Templates on Joyent, AWS, SoftLayer, Rackspace,
CloudStack and OpenStack using Chef/Puppet, RightScale and SCALR
• 300+ employees globally (UK, USA, Denmark, Switzerland, Norway, Belarus,
Argentina)
• v
Featured customers Partners
© ALTOROS Systems | CONFIDENTIAL 3
© ALTOROS Systems | CONFIDENTIAL 4
© ALTOROS Systems | CONFIDENTIAL
56
Combinations
© ALTOROS Systems | CONFIDENTIAL
56
Combinations
15625
© ALTOROS Systems | CONFIDENTIAL 7
© ALTOROS Systems | CONFIDENTIAL 8
No clear business goals
Big amounts of data
from many sources
Architecture design
The variety of tools
Compatibility of technologies/platforms
Lack of professionals
All features in one release
Budget
© ALTOROS Systems | CONFIDENTIAL 9
© ALTOROS Systems | CONFIDENTIAL 10
Functional requirements Value Non-functional requirements
The amount of data added daily: 2.5 TB
• Infrastructure-independent
architecture
• Scalability
• Open-source tools
Data type:  raw data
 processed
data
Data storage time:
 raw data
 Processed data
 min a week
 min a year
Response time:
 for building reports based on a
pre-set template
 for building reports for a
custom period of time
 < 30 sec
 < 6 hours
Uptime: 99%
Fault-tolerance: required
Deployment cost per day: < $1,000
© ALTOROS Systems | CONFIDENTIAL 11
Amazon AWS Joyent Rackspace
Types of a contract On Demand, Reserved,
Spot
On Demand,
Reserved
On Demand
Types of instances
(classified by compute
units)
• General Purpose
• Compute optimized
• Memory optimized
• Storage optimized
• Standard
• High Memory
• High CPU
• High Storage
• High I/O
• General Purpose
Storage options • EBS
• S3
• Low-cost storage
• Network storage
based on ZFS
• Cloud Block
Storage
• Cloud Files
Operating systems Linux, Windows SmartOS, Linux,
Windows
Linux, Windows
A management
console
AWS Console Joyent
SmartDataCenter
Cloud Control Panel
A Cloud API • Command line
interface
• Java, .NET, Ruby
SDK and API
• Command line
interface (CLI)
• Node.js SDK
• REST API
REST API
Regions America, Europe, Asia,
Australia
North America,
Europe
America, Europe, Asia,
Australia
Estimated cost per
month
$18,300 $17,500 $21,350
© ALTOROS Systems | CONFIDENTIAL 12
a good fit a normal fit a bad fit
Option 2 Option 1
Feature Amazon AWS Joyent Rackspace
Types of a contract On Demand, Reserved,
Spot
On Demand, Reserved On Demand
Types of instances
(classified by compute
units)
• General Purpose
• Compute optimized
• Memory optimized
• Storage optimized
• Standard
• High Memory
• High CPU
• High Storage
• High I/O
• General Purpose
Storage options • EBS
• S3
• Low-cost storage
• Network storage
based on ZFS
• Cloud Block Storage
• Cloud Files
Operating systems Linux, Windows SmartOS, Linux,
Windows
Linux, Windows
A management console AWS Console Joyent SmartDataCenter Cloud Control Panel
A Cloud API • Command line
interface
• Java, .NET, Ruby
SDK and API
• Command line
interface (CLI)
• Node.js SDK
• REST API
REST API
Regions America, Europe, Asia,
Australia
North America, Europe America, Europe, Asia,
Australia
Estimated cost per month $18,300 $17,500 $21,350
Score 1.5 3.5
© ALTOROS Systems | CONFIDENTIAL 13
Features HBase Cassandra MongoDB MySQL Cluster
License Apache Apache AGPL GPL
Protocol HTTP/REST (also
Thrift)
Thrift and custom
binary CQL3
Custom, binary
(BSON)
JDBC, ODBC
Data model Column family Column family JSON documents Tables
Queries / Query
Language
JRuby-based
(JIRB) shell
Cassandra Query
Language
JavaScript
expressions
SQL
Partitioning
Strategy
Ordered
Partitioning
Random
Partitioning
Sharding by key Partition by key
Replication
between nodes
yes yes yes yes
Replication
between data
centers
no
yes
no
yes
Capability to store
2.5 TB daily
yes yes yes yes
Implementation
Experience
1+ 1+ 2+ 5+
Score 2 3 2 5
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 14
Features HBase Cassandra MongoDB MySQL Cluster
License Apache Apache AGPL GPL
Protocol HTTP/REST (also
Thrift)
Thrift and custom
binary CQL3
Custom, binary
(BSON)
JDBC, ODBC
Data model Column family Column family JSON documents Tables
Queries / Query
Language
JRuby-based
(JIRB) shell
Cassandra Query
Language
JavaScript
expressions
SQL
Partitioning
Strategy
Ordered
Partitioning
Random
Partitioning
Sharding by key Partition by key
Replication
between data
centers
no
yes
no
yes
Capability to store
2.5 TB daily
yes yes yes yes
Implementation
Experience
1+ 1+ 2+ 5+
Deployment cost
per day
$450 $400 $500 $1,500
Score 2.5 4 2.5 0
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 15
© ALTOROS Systems | CONFIDENTIAL 16
Feature HBase Cassandra MongoDB
Replication between data
centers
Asynchronous,
needs testing
Replicas can span
data centers with
synchronous
replication
Not supported
A cluster admin node NameNode Any node mongos process
Implementation
Experience
1+ 1+ 2+
Time spent on inserting
30 MB of data
7 sec 9 sec 20 sec
Deployment cost per day $450 $400 $500
Score 2 2.5 0
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 17
© ALTOROS Systems | CONFIDENTIAL 18
© ALTOROS Systems | CONFIDENTIAL 19
A requirement The prototype features
Storing of 2.5 TB of daily raw data for a week Capable
Storing of 1.5 TB of processed data for a year Capable
Response time for building reports based on a pre-set
template
~25 sec
Response time of less than 6 hours for building a custom
report
~7 hours
Scalability Good
Infrastructure Independence Yes
Using open-source tools For all components
Fault-tolerance Yes
Deployment cost per day < $1,000 ~$600
© ALTOROS Systems | CONFIDENTIAL
Properly visualize and test the
functionality
Detect bottlenecks and change a
technology/tool/database before it
was implemented in the real system
Get a real vision of the final solution
Make sure you stick to the budget
20
© ALTOROS Systems | CONFIDENTIAL 21
Andrei Yurkevich
President/CTO
andrei.yurkevich@altoros.com

Mais conteúdo relacionado

Mais procurados

Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Amazon Web Services
 

Mais procurados (20)

Performance Demystified for SQL Server on Azure Virtual Machines
Performance Demystified for SQL Server on Azure Virtual MachinesPerformance Demystified for SQL Server on Azure Virtual Machines
Performance Demystified for SQL Server on Azure Virtual Machines
 
Kenshoo - Use Hadoop, One Week, No Coding
Kenshoo - Use Hadoop, One Week, No CodingKenshoo - Use Hadoop, One Week, No Coding
Kenshoo - Use Hadoop, One Week, No Coding
 
Cloud Storage in Azure, AWS and Google Cloud
Cloud  Storage in Azure, AWS and Google CloudCloud  Storage in Azure, AWS and Google Cloud
Cloud Storage in Azure, AWS and Google Cloud
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton Vidishchev
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: Integrations
 
Journey Through the AWS Cloud; Storage and Archiving
Journey Through the AWS Cloud; Storage and ArchivingJourney Through the AWS Cloud; Storage and Archiving
Journey Through the AWS Cloud; Storage and Archiving
 
Cost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
Cost Effective Archiving and Backup in the AWS Cloud with Amazon GlacierCost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
Cost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
 
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft AzureGDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
 
Introduction to AWS Outposts
Introduction to AWS OutpostsIntroduction to AWS Outposts
Introduction to AWS Outposts
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 

Semelhante a Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
AWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC CorporationAWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC Corporation
Manpreet Sidhu
 

Semelhante a Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success (20)

Best Practices for running the Oracle Database on EC2 webinar
Best Practices for running the Oracle Database on EC2 webinarBest Practices for running the Oracle Database on EC2 webinar
Best Practices for running the Oracle Database on EC2 webinar
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
Postgres for Digital Transformation:NoSQL Features, Replication, FDW & MorePostgres for Digital Transformation:NoSQL Features, Replication, FDW & More
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
IaaS azure_vs_amazon
IaaS azure_vs_amazonIaaS azure_vs_amazon
IaaS azure_vs_amazon
 
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less Cash
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWSAWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
 
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
AWS Webcast - Website Hosting
AWS Webcast - Website HostingAWS Webcast - Website Hosting
AWS Webcast - Website Hosting
 
Beyond EBS Stroage Alternatives in the Cloud
Beyond EBS Stroage Alternatives in the CloudBeyond EBS Stroage Alternatives in the Cloud
Beyond EBS Stroage Alternatives in the Cloud
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
 
Harness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and AvereHarness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and Avere
 
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloudKoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
 
AWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC CorporationAWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC Corporation
 

Mais de Altoros

Mais de Altoros (20)

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with Kubernetes
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity Assessment
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success

  • 1. © ALTOROS Systems | CONFIDENTIAL Andrei Yurkevich Chief Technology Officer andrei.yurkevich@altoros.com
  • 2. © ALTOROS Systems | CONFIDENTIAL 2 • Hadoop/NoSQL performance engineering • Cluster Automation & Server Templates on Joyent, AWS, SoftLayer, Rackspace, CloudStack and OpenStack using Chef/Puppet, RightScale and SCALR • 300+ employees globally (UK, USA, Denmark, Switzerland, Norway, Belarus, Argentina) • v Featured customers Partners
  • 3. © ALTOROS Systems | CONFIDENTIAL 3
  • 4. © ALTOROS Systems | CONFIDENTIAL 4
  • 5. © ALTOROS Systems | CONFIDENTIAL 56 Combinations
  • 6. © ALTOROS Systems | CONFIDENTIAL 56 Combinations 15625
  • 7. © ALTOROS Systems | CONFIDENTIAL 7
  • 8. © ALTOROS Systems | CONFIDENTIAL 8 No clear business goals Big amounts of data from many sources Architecture design The variety of tools Compatibility of technologies/platforms Lack of professionals All features in one release Budget
  • 9. © ALTOROS Systems | CONFIDENTIAL 9
  • 10. © ALTOROS Systems | CONFIDENTIAL 10 Functional requirements Value Non-functional requirements The amount of data added daily: 2.5 TB • Infrastructure-independent architecture • Scalability • Open-source tools Data type:  raw data  processed data Data storage time:  raw data  Processed data  min a week  min a year Response time:  for building reports based on a pre-set template  for building reports for a custom period of time  < 30 sec  < 6 hours Uptime: 99% Fault-tolerance: required Deployment cost per day: < $1,000
  • 11. © ALTOROS Systems | CONFIDENTIAL 11 Amazon AWS Joyent Rackspace Types of a contract On Demand, Reserved, Spot On Demand, Reserved On Demand Types of instances (classified by compute units) • General Purpose • Compute optimized • Memory optimized • Storage optimized • Standard • High Memory • High CPU • High Storage • High I/O • General Purpose Storage options • EBS • S3 • Low-cost storage • Network storage based on ZFS • Cloud Block Storage • Cloud Files Operating systems Linux, Windows SmartOS, Linux, Windows Linux, Windows A management console AWS Console Joyent SmartDataCenter Cloud Control Panel A Cloud API • Command line interface • Java, .NET, Ruby SDK and API • Command line interface (CLI) • Node.js SDK • REST API REST API Regions America, Europe, Asia, Australia North America, Europe America, Europe, Asia, Australia Estimated cost per month $18,300 $17,500 $21,350
  • 12. © ALTOROS Systems | CONFIDENTIAL 12 a good fit a normal fit a bad fit Option 2 Option 1 Feature Amazon AWS Joyent Rackspace Types of a contract On Demand, Reserved, Spot On Demand, Reserved On Demand Types of instances (classified by compute units) • General Purpose • Compute optimized • Memory optimized • Storage optimized • Standard • High Memory • High CPU • High Storage • High I/O • General Purpose Storage options • EBS • S3 • Low-cost storage • Network storage based on ZFS • Cloud Block Storage • Cloud Files Operating systems Linux, Windows SmartOS, Linux, Windows Linux, Windows A management console AWS Console Joyent SmartDataCenter Cloud Control Panel A Cloud API • Command line interface • Java, .NET, Ruby SDK and API • Command line interface (CLI) • Node.js SDK • REST API REST API Regions America, Europe, Asia, Australia North America, Europe America, Europe, Asia, Australia Estimated cost per month $18,300 $17,500 $21,350 Score 1.5 3.5
  • 13. © ALTOROS Systems | CONFIDENTIAL 13 Features HBase Cassandra MongoDB MySQL Cluster License Apache Apache AGPL GPL Protocol HTTP/REST (also Thrift) Thrift and custom binary CQL3 Custom, binary (BSON) JDBC, ODBC Data model Column family Column family JSON documents Tables Queries / Query Language JRuby-based (JIRB) shell Cassandra Query Language JavaScript expressions SQL Partitioning Strategy Ordered Partitioning Random Partitioning Sharding by key Partition by key Replication between nodes yes yes yes yes Replication between data centers no yes no yes Capability to store 2.5 TB daily yes yes yes yes Implementation Experience 1+ 1+ 2+ 5+ Score 2 3 2 5 a good fit a normal fit a bad fit
  • 14. © ALTOROS Systems | CONFIDENTIAL 14 Features HBase Cassandra MongoDB MySQL Cluster License Apache Apache AGPL GPL Protocol HTTP/REST (also Thrift) Thrift and custom binary CQL3 Custom, binary (BSON) JDBC, ODBC Data model Column family Column family JSON documents Tables Queries / Query Language JRuby-based (JIRB) shell Cassandra Query Language JavaScript expressions SQL Partitioning Strategy Ordered Partitioning Random Partitioning Sharding by key Partition by key Replication between data centers no yes no yes Capability to store 2.5 TB daily yes yes yes yes Implementation Experience 1+ 1+ 2+ 5+ Deployment cost per day $450 $400 $500 $1,500 Score 2.5 4 2.5 0 a good fit a normal fit a bad fit
  • 15. © ALTOROS Systems | CONFIDENTIAL 15
  • 16. © ALTOROS Systems | CONFIDENTIAL 16 Feature HBase Cassandra MongoDB Replication between data centers Asynchronous, needs testing Replicas can span data centers with synchronous replication Not supported A cluster admin node NameNode Any node mongos process Implementation Experience 1+ 1+ 2+ Time spent on inserting 30 MB of data 7 sec 9 sec 20 sec Deployment cost per day $450 $400 $500 Score 2 2.5 0 a good fit a normal fit a bad fit
  • 17. © ALTOROS Systems | CONFIDENTIAL 17
  • 18. © ALTOROS Systems | CONFIDENTIAL 18
  • 19. © ALTOROS Systems | CONFIDENTIAL 19 A requirement The prototype features Storing of 2.5 TB of daily raw data for a week Capable Storing of 1.5 TB of processed data for a year Capable Response time for building reports based on a pre-set template ~25 sec Response time of less than 6 hours for building a custom report ~7 hours Scalability Good Infrastructure Independence Yes Using open-source tools For all components Fault-tolerance Yes Deployment cost per day < $1,000 ~$600
  • 20. © ALTOROS Systems | CONFIDENTIAL Properly visualize and test the functionality Detect bottlenecks and change a technology/tool/database before it was implemented in the real system Get a real vision of the final solution Make sure you stick to the budget 20
  • 21. © ALTOROS Systems | CONFIDENTIAL 21 Andrei Yurkevich President/CTO andrei.yurkevich@altoros.com

Notas do Editor

  1. VolumeVelocityVarietyWhere to start?
  2. Everything seemed to be smooth. However, there was just one slight detail about MySQL Cluster. Its architecture requires putting all data into RAM, so we needed a cluster that would have 2.5 TB of RAM. The actual deployment cost was about $500 up the budget. So, we had to start from scratch again.
  3. HBase was 2 seconds faster than Cassandra but what about fault tolerance? HBase has additional node that serves as a coordinator for the entire system. If it fails – the system fails. Surely we can add a secondary management node, but then we may exceed the budget. Cassandra has decentralized architecture it means that all nodes of its cluster have equal roles and every node can serve as a coordinator. It makes this database extremely fault tolerant. 
  4. raw data – is all data that comes from sensorsprocessed data – is the data that was aggregated for each 10 minutes. This data is used for building reports.