SlideShare uma empresa Scribd logo
1 de 21
© ALTOROS Systems | CONFIDENTIAL
Andrei Yurkevich
Chief Technology Officer
andrei.yurkevich@altoros.com
© ALTOROS Systems | CONFIDENTIAL 2
• Hadoop/NoSQL performance engineering
• Cluster Automation & Server Templates on Joyent, AWS, SoftLayer, Rackspace,
CloudStack and OpenStack using Chef/Puppet, RightScale and SCALR
• 300+ employees globally (UK, USA, Denmark, Switzerland, Norway, Belarus,
Argentina)
• v
Featured customers Partners
© ALTOROS Systems | CONFIDENTIAL 3
© ALTOROS Systems | CONFIDENTIAL 4
© ALTOROS Systems | CONFIDENTIAL
56
Combinations
© ALTOROS Systems | CONFIDENTIAL
56
Combinations
15625
© ALTOROS Systems | CONFIDENTIAL 7
© ALTOROS Systems | CONFIDENTIAL 8
No clear business goals
Big amounts of data
from many sources
Architecture design
The variety of tools
Compatibility of technologies/platforms
Lack of professionals
All features in one release
Budget
© ALTOROS Systems | CONFIDENTIAL 9
© ALTOROS Systems | CONFIDENTIAL 10
Functional requirements Value Non-functional requirements
The amount of data added daily: 2.5 TB
• Infrastructure-independent
architecture
• Scalability
• Open-source tools
Data type:  raw data
 processed
data
Data storage time:
 raw data
 Processed data
 min a week
 min a year
Response time:
 for building reports based on a
pre-set template
 for building reports for a
custom period of time
 < 30 sec
 < 6 hours
Uptime: 99%
Fault-tolerance: required
Deployment cost per day: < $1,000
© ALTOROS Systems | CONFIDENTIAL 11
Amazon AWS Joyent Rackspace
Types of a contract On Demand, Reserved,
Spot
On Demand,
Reserved
On Demand
Types of instances
(classified by compute
units)
• General Purpose
• Compute optimized
• Memory optimized
• Storage optimized
• Standard
• High Memory
• High CPU
• High Storage
• High I/O
• General Purpose
Storage options • EBS
• S3
• Low-cost storage
• Network storage
based on ZFS
• Cloud Block
Storage
• Cloud Files
Operating systems Linux, Windows SmartOS, Linux,
Windows
Linux, Windows
A management
console
AWS Console Joyent
SmartDataCenter
Cloud Control Panel
A Cloud API • Command line
interface
• Java, .NET, Ruby
SDK and API
• Command line
interface (CLI)
• Node.js SDK
• REST API
REST API
Regions America, Europe, Asia,
Australia
North America,
Europe
America, Europe, Asia,
Australia
Estimated cost per
month
$18,300 $17,500 $21,350
© ALTOROS Systems | CONFIDENTIAL 12
a good fit a normal fit a bad fit
Option 2 Option 1
Feature Amazon AWS Joyent Rackspace
Types of a contract On Demand, Reserved,
Spot
On Demand, Reserved On Demand
Types of instances
(classified by compute
units)
• General Purpose
• Compute optimized
• Memory optimized
• Storage optimized
• Standard
• High Memory
• High CPU
• High Storage
• High I/O
• General Purpose
Storage options • EBS
• S3
• Low-cost storage
• Network storage
based on ZFS
• Cloud Block Storage
• Cloud Files
Operating systems Linux, Windows SmartOS, Linux,
Windows
Linux, Windows
A management console AWS Console Joyent SmartDataCenter Cloud Control Panel
A Cloud API • Command line
interface
• Java, .NET, Ruby
SDK and API
• Command line
interface (CLI)
• Node.js SDK
• REST API
REST API
Regions America, Europe, Asia,
Australia
North America, Europe America, Europe, Asia,
Australia
Estimated cost per month $18,300 $17,500 $21,350
Score 1.5 3.5
© ALTOROS Systems | CONFIDENTIAL 13
Features HBase Cassandra MongoDB MySQL Cluster
License Apache Apache AGPL GPL
Protocol HTTP/REST (also
Thrift)
Thrift and custom
binary CQL3
Custom, binary
(BSON)
JDBC, ODBC
Data model Column family Column family JSON documents Tables
Queries / Query
Language
JRuby-based
(JIRB) shell
Cassandra Query
Language
JavaScript
expressions
SQL
Partitioning
Strategy
Ordered
Partitioning
Random
Partitioning
Sharding by key Partition by key
Replication
between nodes
yes yes yes yes
Replication
between data
centers
no
yes
no
yes
Capability to store
2.5 TB daily
yes yes yes yes
Implementation
Experience
1+ 1+ 2+ 5+
Score 2 3 2 5
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 14
Features HBase Cassandra MongoDB MySQL Cluster
License Apache Apache AGPL GPL
Protocol HTTP/REST (also
Thrift)
Thrift and custom
binary CQL3
Custom, binary
(BSON)
JDBC, ODBC
Data model Column family Column family JSON documents Tables
Queries / Query
Language
JRuby-based
(JIRB) shell
Cassandra Query
Language
JavaScript
expressions
SQL
Partitioning
Strategy
Ordered
Partitioning
Random
Partitioning
Sharding by key Partition by key
Replication
between data
centers
no
yes
no
yes
Capability to store
2.5 TB daily
yes yes yes yes
Implementation
Experience
1+ 1+ 2+ 5+
Deployment cost
per day
$450 $400 $500 $1,500
Score 2.5 4 2.5 0
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 15
© ALTOROS Systems | CONFIDENTIAL 16
Feature HBase Cassandra MongoDB
Replication between data
centers
Asynchronous,
needs testing
Replicas can span
data centers with
synchronous
replication
Not supported
A cluster admin node NameNode Any node mongos process
Implementation
Experience
1+ 1+ 2+
Time spent on inserting
30 MB of data
7 sec 9 sec 20 sec
Deployment cost per day $450 $400 $500
Score 2 2.5 0
a good fit a normal fit a bad fit
© ALTOROS Systems | CONFIDENTIAL 17
© ALTOROS Systems | CONFIDENTIAL 18
© ALTOROS Systems | CONFIDENTIAL 19
A requirement The prototype features
Storing of 2.5 TB of daily raw data for a week Capable
Storing of 1.5 TB of processed data for a year Capable
Response time for building reports based on a pre-set
template
~25 sec
Response time of less than 6 hours for building a custom
report
~7 hours
Scalability Good
Infrastructure Independence Yes
Using open-source tools For all components
Fault-tolerance Yes
Deployment cost per day < $1,000 ~$600
© ALTOROS Systems | CONFIDENTIAL
Properly visualize and test the
functionality
Detect bottlenecks and change a
technology/tool/database before it
was implemented in the real system
Get a real vision of the final solution
Make sure you stick to the budget
20
© ALTOROS Systems | CONFIDENTIAL 21
Andrei Yurkevich
President/CTO
andrei.yurkevich@altoros.com

Mais conteúdo relacionado

Mais procurados

Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Amazon Web Services
 

Mais procurados (20)

Performance Demystified for SQL Server on Azure Virtual Machines
Performance Demystified for SQL Server on Azure Virtual MachinesPerformance Demystified for SQL Server on Azure Virtual Machines
Performance Demystified for SQL Server on Azure Virtual Machines
 
Kenshoo - Use Hadoop, One Week, No Coding
Kenshoo - Use Hadoop, One Week, No CodingKenshoo - Use Hadoop, One Week, No Coding
Kenshoo - Use Hadoop, One Week, No Coding
 
Cloud Storage in Azure, AWS and Google Cloud
Cloud  Storage in Azure, AWS and Google CloudCloud  Storage in Azure, AWS and Google Cloud
Cloud Storage in Azure, AWS and Google Cloud
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton Vidishchev
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: Integrations
 
Journey Through the AWS Cloud; Storage and Archiving
Journey Through the AWS Cloud; Storage and ArchivingJourney Through the AWS Cloud; Storage and Archiving
Journey Through the AWS Cloud; Storage and Archiving
 
Cost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
Cost Effective Archiving and Backup in the AWS Cloud with Amazon GlacierCost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
Cost Effective Archiving and Backup in the AWS Cloud with Amazon Glacier
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
Media Content Ingest, Storage, and Archiving with AWS - John Downey, Amazon W...
 
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft AzureGDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
 
Introduction to AWS Outposts
Introduction to AWS OutpostsIntroduction to AWS Outposts
Introduction to AWS Outposts
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 

Semelhante a Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
AWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC CorporationAWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC Corporation
Manpreet Sidhu
 

Semelhante a Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success (20)

Best Practices for running the Oracle Database on EC2 webinar
Best Practices for running the Oracle Database on EC2 webinarBest Practices for running the Oracle Database on EC2 webinar
Best Practices for running the Oracle Database on EC2 webinar
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
Postgres for Digital Transformation:NoSQL Features, Replication, FDW & MorePostgres for Digital Transformation:NoSQL Features, Replication, FDW & More
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
IaaS azure_vs_amazon
IaaS azure_vs_amazonIaaS azure_vs_amazon
IaaS azure_vs_amazon
 
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less Cash
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWSAWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
 
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
AWS Webcast - Website Hosting
AWS Webcast - Website HostingAWS Webcast - Website Hosting
AWS Webcast - Website Hosting
 
Beyond EBS Stroage Alternatives in the Cloud
Beyond EBS Stroage Alternatives in the CloudBeyond EBS Stroage Alternatives in the Cloud
Beyond EBS Stroage Alternatives in the Cloud
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
 
Harness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and AvereHarness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and Avere
 
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloudKoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
KoprowskiT_SQLRelay2014#8_Birmingham_FromPlanToBackupToCloud
 
AWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC CorporationAWS Public Cloud solution for ABC Corporation
AWS Public Cloud solution for ABC Corporation
 

Mais de Altoros

Mais de Altoros (20)

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with Kubernetes
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity Assessment
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 

Último

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Último (20)

TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 

Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success

  • 1. © ALTOROS Systems | CONFIDENTIAL Andrei Yurkevich Chief Technology Officer andrei.yurkevich@altoros.com
  • 2. © ALTOROS Systems | CONFIDENTIAL 2 • Hadoop/NoSQL performance engineering • Cluster Automation & Server Templates on Joyent, AWS, SoftLayer, Rackspace, CloudStack and OpenStack using Chef/Puppet, RightScale and SCALR • 300+ employees globally (UK, USA, Denmark, Switzerland, Norway, Belarus, Argentina) • v Featured customers Partners
  • 3. © ALTOROS Systems | CONFIDENTIAL 3
  • 4. © ALTOROS Systems | CONFIDENTIAL 4
  • 5. © ALTOROS Systems | CONFIDENTIAL 56 Combinations
  • 6. © ALTOROS Systems | CONFIDENTIAL 56 Combinations 15625
  • 7. © ALTOROS Systems | CONFIDENTIAL 7
  • 8. © ALTOROS Systems | CONFIDENTIAL 8 No clear business goals Big amounts of data from many sources Architecture design The variety of tools Compatibility of technologies/platforms Lack of professionals All features in one release Budget
  • 9. © ALTOROS Systems | CONFIDENTIAL 9
  • 10. © ALTOROS Systems | CONFIDENTIAL 10 Functional requirements Value Non-functional requirements The amount of data added daily: 2.5 TB • Infrastructure-independent architecture • Scalability • Open-source tools Data type:  raw data  processed data Data storage time:  raw data  Processed data  min a week  min a year Response time:  for building reports based on a pre-set template  for building reports for a custom period of time  < 30 sec  < 6 hours Uptime: 99% Fault-tolerance: required Deployment cost per day: < $1,000
  • 11. © ALTOROS Systems | CONFIDENTIAL 11 Amazon AWS Joyent Rackspace Types of a contract On Demand, Reserved, Spot On Demand, Reserved On Demand Types of instances (classified by compute units) • General Purpose • Compute optimized • Memory optimized • Storage optimized • Standard • High Memory • High CPU • High Storage • High I/O • General Purpose Storage options • EBS • S3 • Low-cost storage • Network storage based on ZFS • Cloud Block Storage • Cloud Files Operating systems Linux, Windows SmartOS, Linux, Windows Linux, Windows A management console AWS Console Joyent SmartDataCenter Cloud Control Panel A Cloud API • Command line interface • Java, .NET, Ruby SDK and API • Command line interface (CLI) • Node.js SDK • REST API REST API Regions America, Europe, Asia, Australia North America, Europe America, Europe, Asia, Australia Estimated cost per month $18,300 $17,500 $21,350
  • 12. © ALTOROS Systems | CONFIDENTIAL 12 a good fit a normal fit a bad fit Option 2 Option 1 Feature Amazon AWS Joyent Rackspace Types of a contract On Demand, Reserved, Spot On Demand, Reserved On Demand Types of instances (classified by compute units) • General Purpose • Compute optimized • Memory optimized • Storage optimized • Standard • High Memory • High CPU • High Storage • High I/O • General Purpose Storage options • EBS • S3 • Low-cost storage • Network storage based on ZFS • Cloud Block Storage • Cloud Files Operating systems Linux, Windows SmartOS, Linux, Windows Linux, Windows A management console AWS Console Joyent SmartDataCenter Cloud Control Panel A Cloud API • Command line interface • Java, .NET, Ruby SDK and API • Command line interface (CLI) • Node.js SDK • REST API REST API Regions America, Europe, Asia, Australia North America, Europe America, Europe, Asia, Australia Estimated cost per month $18,300 $17,500 $21,350 Score 1.5 3.5
  • 13. © ALTOROS Systems | CONFIDENTIAL 13 Features HBase Cassandra MongoDB MySQL Cluster License Apache Apache AGPL GPL Protocol HTTP/REST (also Thrift) Thrift and custom binary CQL3 Custom, binary (BSON) JDBC, ODBC Data model Column family Column family JSON documents Tables Queries / Query Language JRuby-based (JIRB) shell Cassandra Query Language JavaScript expressions SQL Partitioning Strategy Ordered Partitioning Random Partitioning Sharding by key Partition by key Replication between nodes yes yes yes yes Replication between data centers no yes no yes Capability to store 2.5 TB daily yes yes yes yes Implementation Experience 1+ 1+ 2+ 5+ Score 2 3 2 5 a good fit a normal fit a bad fit
  • 14. © ALTOROS Systems | CONFIDENTIAL 14 Features HBase Cassandra MongoDB MySQL Cluster License Apache Apache AGPL GPL Protocol HTTP/REST (also Thrift) Thrift and custom binary CQL3 Custom, binary (BSON) JDBC, ODBC Data model Column family Column family JSON documents Tables Queries / Query Language JRuby-based (JIRB) shell Cassandra Query Language JavaScript expressions SQL Partitioning Strategy Ordered Partitioning Random Partitioning Sharding by key Partition by key Replication between data centers no yes no yes Capability to store 2.5 TB daily yes yes yes yes Implementation Experience 1+ 1+ 2+ 5+ Deployment cost per day $450 $400 $500 $1,500 Score 2.5 4 2.5 0 a good fit a normal fit a bad fit
  • 15. © ALTOROS Systems | CONFIDENTIAL 15
  • 16. © ALTOROS Systems | CONFIDENTIAL 16 Feature HBase Cassandra MongoDB Replication between data centers Asynchronous, needs testing Replicas can span data centers with synchronous replication Not supported A cluster admin node NameNode Any node mongos process Implementation Experience 1+ 1+ 2+ Time spent on inserting 30 MB of data 7 sec 9 sec 20 sec Deployment cost per day $450 $400 $500 Score 2 2.5 0 a good fit a normal fit a bad fit
  • 17. © ALTOROS Systems | CONFIDENTIAL 17
  • 18. © ALTOROS Systems | CONFIDENTIAL 18
  • 19. © ALTOROS Systems | CONFIDENTIAL 19 A requirement The prototype features Storing of 2.5 TB of daily raw data for a week Capable Storing of 1.5 TB of processed data for a year Capable Response time for building reports based on a pre-set template ~25 sec Response time of less than 6 hours for building a custom report ~7 hours Scalability Good Infrastructure Independence Yes Using open-source tools For all components Fault-tolerance Yes Deployment cost per day < $1,000 ~$600
  • 20. © ALTOROS Systems | CONFIDENTIAL Properly visualize and test the functionality Detect bottlenecks and change a technology/tool/database before it was implemented in the real system Get a real vision of the final solution Make sure you stick to the budget 20
  • 21. © ALTOROS Systems | CONFIDENTIAL 21 Andrei Yurkevich President/CTO andrei.yurkevich@altoros.com

Notas do Editor

  1. VolumeVelocityVarietyWhere to start?
  2. Everything seemed to be smooth. However, there was just one slight detail about MySQL Cluster. Its architecture requires putting all data into RAM, so we needed a cluster that would have 2.5 TB of RAM. The actual deployment cost was about $500 up the budget. So, we had to start from scratch again.
  3. HBase was 2 seconds faster than Cassandra but what about fault tolerance? HBase has additional node that serves as a coordinator for the entire system. If it fails – the system fails. Surely we can add a secondary management node, but then we may exceed the budget. Cassandra has decentralized architecture it means that all nodes of its cluster have equal roles and every node can serve as a coordinator. It makes this database extremely fault tolerant. 
  4. raw data – is all data that comes from sensorsprocessed data – is the data that was aggregated for each 10 minutes. This data is used for building reports.