SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Scaling Data On Public Clouds

        Liran Zelkha, Founder
    Liran.zelkha@scalebase.com
About Us
• ScaleBase is a new startup targeting the
  database-as-a-service market (DBaaS)
• We offer unlimited database scalability and
  availability using our Database Load Balancer
• We currently run in beta mode – contact me if
  you want to join
Problem Of Data
• Flickr just hit 5B pictures
• Facebook > 0.5B users
• Farmville have more monthly players than the
  population of France
Mondays Key Note
•   More data
•   More users
•   More complex actions
•   Shorter response times
Scalability Pain

Infrastructure
Cost $
                   Large                                       You just lost
                   Capital                                      customers
                 Expenditure


                                                                           Predicted
                                                                           Demand

                                            Opportunity                        Traditional
                                              Cost                             Hardware

                                                                               Actual
                                                                               Demand

                                                                           Automated
                                                                           Virtualization


                                                                                       time

   http://media.amazonwebservices.com/pdf/IBMWebinarDeck_Final.pdf
CAP vs. ACID
• CAP = Consistency, Availability, Partition
  Tolerance
• ACID = Atomicity, Consistency, Isolation,
  Durability

• Atomicity – Chain of actions treated as one
  whole unseperateable action
• Isolation – Consistent query snapshots, read
  across writes, 4 levels are supported
ScaleBase
                        Database Scaling In A Box

Applications       Legacy clients
                                       • The first truly elastic,
                                         fault tolerant SQL
                                         based data layer
                                       • Enables linear scaling
       Scalebase                         of any SQL based
                                         database

        Database instances
ScaleBase
Application/Web Servers




                                            Shared Nothing
                                             DB Machines

                                              Commodity
                                               Hardware




                                ScaleBase
                                            MySQL? Oracle?


                                            Scalable and hi-
                                               available
THE REQUIREMENTS FOR DATA SLA
IN PUBLIC CLOUD ENVIRONMENTS
What We Need
• Availability
• Consistency
• Scalability
Brewer's (CAP) Theorem
• It is impossible for a distributed computer
  system to simultaneously provide all three of
  the following guarantees:
  – Consistency (all nodes see the same data at the
    same time)
  – Availability (node failures do not prevent survivors
    from continuing to operate)
  – Partition Tolerance (the system continues to
    operate despite arbitrary message loss)

                            http://en.wikipedia.org/wiki/CAP_theorem
What It Means




http://guyharrison.squarespace.com/blog/2010/6/13/consistency-models-in-non-relational-databases.html
Reading More On CAP
• This is an excellent read, and some of my
  samples are from this blog
  – http://www.julianbrowne.com/article/viewer/bre
    wers-cap-theorem
ACHIEVING DATA SCALABILITY
WITH RELATIONAL DATABASES
Databases And CAP
• ACID – Consistency
• Availability – tons of solutions, most of them
  not cloud oriented
  – Oracle RAC
  – MySQL Proxy
  – Etc.
  – Replication based solutions can solve at least read
    availability and scalability (see Azure SQL)
Database Cloud Solutions
• Amazon RDS
• NaviSite Oracle RAC
• Joyent + Zeus
So Where Is The Problem?
• Scaling problems (usually write but also read)
• Schema change problems
• BigData problems
Scaling Up
• Issues with scaling up when the dataset is just
  too big
• RDBMS were not designed to be distributed
• Began to look at multi-node database
  solutions
• Known as ‘scaling out’ or ‘horizontal scaling’
• Different approaches include:
  – Master-slave
  – Sharding
Scaling RDBMS – Master/Slave
• Master-Slave
  – All writes are written to the master. All reads
    performed against the replicated slave databases
  – Critical reads may be incorrect as writes may not
    have been propagated down
  – Large data sets can pose problems as master
    needs to duplicate data to slaves
Scaling RDBMS - Sharding
• Partition or sharding
  – Scales well for both reads and writes
  – Not transparent, application needs to be partition-
    aware
  – Can no longer have relationships/joins across
    partitions
  – Loss of referential integrity across shards
Other ways to scale RDBMS
• Multi-Master replication
• INSERT only, not UPDATES/DELETES
• No JOINs, thereby reducing query time
  – This involves de-normalizing data
• In-memory databases
ACHIEVING DATA SLA WITH NOSQL
NoSQL
• A term used to designate databases which
  differ from classic relational databases in
  some way. These data stores may not require
  fixed table schemas, and usually
  avoid join operations and typically scale
  horizontally. Academics and papers typically
  refer to these databases as structured storage,
  a term which would include classic relational
  databases as a subset.
                             http://en.wikipedia.org/wiki/NoSQL
NoSQL Types
• Key/Value
  – A big hash table
  – Examples: Voldemort, Amazon Dynamo
• Big Table
  – Big table, column families
  – Examples: Hbase, Cassandra
• Document based
  – Collections of collections
  – Examples: CouchDB, MongoDB
• Each solves a different problem
NO-SQL




  http://browsertoolkit.com/fault-tolerance.png
Pros/Cons
• Pros:
   –   Performance
   –   BigData
   –   Most solutions are open source
   –   Data is replicated to nodes and is therefore fault-tolerant (partitioning)
   –   Don't require a schema
   –   Can scale up and down
• Cons:
   –   Code change
   –   Limited framework support
   –   Not ACID
   –   Eco system (BI, Backup)
   –   There is always a database at the backend
   –   Some API is just too simple
Amazon S3 Code Sample
AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format);

Response response = conn.createBucket(bucketName, location, null);

final String text = "this is a test";

response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);
Cassandra Code Sample
CassandraClient cl = pool.getClient() ;
KeySpace ks = cl.getKeySpace("Keyspace1") ;

// insert value
ColumnPath cp = new ColumnPath("Standard1" , null,
"testInsertAndGetAndRemove".getBytes("utf-8"));
for(int i = 0 ; i < 100 ; i++){
         ks.insert("testInsertAndGetAndRemove_"+i, cp ,
("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8"));
}

//get value
for(int i = 0 ; i < 100 ; i++){
         Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
         String value = new String(col.getValue(),"utf-8") ;
}

//remove value
for(int i = 0 ; i < 100 ; i++){
         ks.remove("testInsertAndGetAndRemove_"+i, cp);
}
Cassandra Code Sample – Cont’
try{
         ks.remove("testInsertAndGetAndRemove_not_exist", cp);
}catch(Exception e){
         fail("remove not exist row should not throw exceptions");
}

//get already removed value
for(int i = 0 ; i < 100 ; i++){
try{
         Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
         fail("the value should already being deleted");
}catch(NotFoundException e){

}catch(Exception e){
                 fail("throw out other exception, should be
NotFoundException." + e.toString() );
         }
}

pool.releaseClient(cl) ;
pool.close() ;
Cassandra Statistics
• Facebook Search
• MySQL > 50 GB Data
  – Writes Average : ~300 ms
  – Reads Average : ~350 ms
• Rewritten with Cassandra > 50 GB Data
  – Writes Average : 0.12 ms
  – Reads Average : 15 ms
MongoDB
Mongo m = new Mongo();

DB db = m.getDB( "mydb" );
Set<String> colls = db.getCollectionNames();

for (String s : colls) {
       System.out.println(s);
}
MongoDB – Cont’
BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");
doc.put("type", "database");
doc.put("count", 1);

BasicDBObject info = new BasicDBObject();

info.put("x", 203);
info.put("y", 102);

doc.put("info", info);

coll.insert(doc);
THE BOTTOM LINE
Data SLA
• There is no golden hammer
   – See http://sourcemaking.com/antipatterns/golden-
     hammer
• Choose your tool wisely, based on what you need
• Usually
   – Start with RDBMS (shortest TTM, which is what we
     really care about)
   – When scale issues occur – start moving to NoSQL
     based on your needs
• You can get Data Scalability in the cloud – just
  think before you code!!!
A BIT MORE ON SCALEBASE
How ScaleBase Works
                                           Application
• ScaleBase takes an application
  database and splits its data
  across multiple, separated
  instances (a technique called
  Sharding)
• Queries and DML are either:
   – Directed to correct instance, or
   – Executed simultaneously across       ScaleBase
     several instances
• Results are aggregated and
  returned to the original
  application


                                        Database instances
Example
                                         ID    First name   Last name

                                         102   Lex          De Haan

                                         105   David        Austin




                                         ID    First name   Last name
ID    First name   Last name
                                         100   Steven       King
100   Steven       King
                                         103   Alexander    Hunold
101   Neena        Kochhar
                                         106   Valli        Pataballa
102   Lex          De Haan

103   Alexander    Hunold

104   Bruce        Ernst                 ID    First name   Last name

105   David        Austin                101   Neena        Kochhar

106   Valli        Pataballa             104   Bruce        Ernst
ScaleBase Supports
• 3 table types
    – Master
    – Global
    – Split
•   Splitting according to Hash, List, Range
•   Rebalance, addition and removal of machines
•   Instance replication backup: Shadow and Master
•   Full consistent 2-Phase Commit
•   Joins, Foreign Keys, Subqueries
•   DML, DDL, Batch updates, Prepared Statements
•   Aggregations, Group By, Order By, Auto Numbering,
    Timestamps
Sample Code
SELECT site_owner_id, count(*)FROM google.user_clicks
WHERE country = ‘BRAZIL’
GROUP BY site_owner_id


• site_owner_id is the split key
• Perform the query on all DBs
• Simple aggregation of results

• No Code Change
Sample Code
SELECT country, count(*)FROM google.user_clicks
GROUP BY country



• Perform the query on all DBs
• Aggregation of the aggregations

• No Code Change
Sample Code
PreparedStatement pstmt = conn.prepareStatement("INSERT INTO emp VALUES(?,?,?,?,?)");
for (int i = 0; i < 10; i++) {
                pstmt.setInt(1, 300 + i);
                pstmt.setString(2, "Something" + i);
                pstmt.setDate(3, new Date(System.currentTimeMillis()));
                pstmt.setInt(4, i);
                pstmt.setLong(5, i);
                pstmt.addBatch();
}
int[] result = pstmt.executeBatch();



• Is split key dynamic or static?
• Each command is added to the correct DB,
  execution is on all relevant DBs

• No Code Change
ScaleBase Solution
• Elastic SQL Database Scaling hi-availability
  solution
  – Complete
  – Transparent
  – Super scalable
  – Out of the box
  – Non-intrusive
  – Flexible
  – Manageable
With ScaleBase
• Pay much less for hardware and Database licenses
• Get more power, better data spreading and better
  availability
• Real linear scalability
• Go for real grid, cloud and virtualization

• ScaleBase is NOT:
   – Is NOT an RDBMS. It facilitates any secure, high-available,
     archivable RDBMS (Oracle, DB2, MySQL, any!)
   – Does NOT require schema structure modifications
   – Does NOT require additional sophisticated hardware
Moving To ScaleBase
• Implementing ScaleBase is done in minutes
• Just direct your application to your ScaleBase
  instance
• Target ScaleBase to your original database and
  target database instances
• ScaleBase will automatically migrate your schema
  and data to the new servers
• After all data is transferred ScaleBase will start
  working with target database instances, giving
  you linear scalability – with no down time!
Where ScaleBase Fits In
• Cloud databases
  – Use SQL databases in the cloud, and get Scale Out
    features and high availability
• High scale applications
  – Use your existing code, without migrating to
    NOSQL, and get unlimited SQL scalability
CUSTOMER CASE STUDIES
Public Cloud
• Scenario:
  – A startup developing a complex iPhone application
    running on EC2
  – Requires high scaling option for SQL based database
• Problem:
  – Amazon RDS offers only Scale Up option, no Scale Out
• Solution:
  – Customer switched their Amazon RDS-based
    application to ScaleBase on EC2
  – Gained transparent, linear, scaling
  – Running 4 RDS instances behind ScaleBase
Private Cloud
• Scenario:
   – A company selling devices that ‘ping’ home every 5 minute
   – 8 digit number of devices sold
• Problem:
   – Evaluated MySQL, Oracle - no single machine can support
     required devices
   – Clustering options too expensive, limited scalability
• Solution:
   – Customer moved to ScaleBase with no code changes
   – Gained linear scales. Runs 4 MySQL databases behind
     ScaleBase

Mais conteúdo relacionado

Mais procurados

Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
IDERA Software
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetup
Patrick McFadin
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdfDatabase & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
InSync2011
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
DataStax
 

Mais procurados (20)

Summit 2011 infra_dbms
Summit 2011 infra_dbmsSummit 2011 infra_dbms
Summit 2011 infra_dbms
 
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Vi...
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetup
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
cloud conference 2013 - Infrastructure as a Service in Amazon Web Services
cloud conference 2013 - Infrastructure as a Service in Amazon Web Servicescloud conference 2013 - Infrastructure as a Service in Amazon Web Services
cloud conference 2013 - Infrastructure as a Service in Amazon Web Services
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cache
 
Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable ...
Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable ...Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable ...
Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable ...
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data Analysis
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Concevoir une application scalable dans le Cloud
Concevoir une application scalable dans le CloudConcevoir une application scalable dans le Cloud
Concevoir une application scalable dans le Cloud
 
NoSQL and AWS Dynamodb
NoSQL and AWS DynamodbNoSQL and AWS Dynamodb
NoSQL and AWS Dynamodb
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdfDatabase & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 

Destaque

OC4J to WebLogic Server Migration5
OC4J to WebLogic Server Migration5OC4J to WebLogic Server Migration5
OC4J to WebLogic Server Migration5
Liran Zelkha
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
Liran Zelkha
 

Destaque (8)

Building Eclipse Plugins
Building Eclipse PluginsBuilding Eclipse Plugins
Building Eclipse Plugins
 
שטפונות בנגב
שטפונות בנגבשטפונות בנגב
שטפונות בנגב
 
PDE builds or Maven
PDE builds or MavenPDE builds or Maven
PDE builds or Maven
 
Eclipse Plug-in Develompent Tips And Tricks
Eclipse Plug-in Develompent Tips And TricksEclipse Plug-in Develompent Tips And Tricks
Eclipse Plug-in Develompent Tips And Tricks
 
OC4J to WebLogic Server Migration5
OC4J to WebLogic Server Migration5OC4J to WebLogic Server Migration5
OC4J to WebLogic Server Migration5
 
L0016 - The Structure of an Eclipse Plug-in
L0016 - The Structure of an Eclipse Plug-inL0016 - The Structure of an Eclipse Plug-in
L0016 - The Structure of an Eclipse Plug-in
 
Creating a Plug-In Architecture
Creating a Plug-In ArchitectureCreating a Plug-In Architecture
Creating a Plug-In Architecture
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
 

Semelhante a Scaling data on public clouds

Running your database in the cloud presentation
Running your database in the cloud presentationRunning your database in the cloud presentation
Running your database in the cloud presentation
Aravindharamanan S
 
Running your database in the cloud presentation
Running your database in the cloud presentationRunning your database in the cloud presentation
Running your database in the cloud presentation
Aravindharamanan S
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled Apps
Amazon Web Services
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
Noam Sheffer
 
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Amazon Web Services
 

Semelhante a Scaling data on public clouds (20)

Running your database in the cloud presentation
Running your database in the cloud presentationRunning your database in the cloud presentation
Running your database in the cloud presentation
 
Running your database in the cloud presentation
Running your database in the cloud presentationRunning your database in the cloud presentation
Running your database in the cloud presentation
 
Running your database in the cloud presentation
Running your database in the cloud presentationRunning your database in the cloud presentation
Running your database in the cloud presentation
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled Apps
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Scaling data on public clouds

  • 1. Scaling Data On Public Clouds Liran Zelkha, Founder Liran.zelkha@scalebase.com
  • 2. About Us • ScaleBase is a new startup targeting the database-as-a-service market (DBaaS) • We offer unlimited database scalability and availability using our Database Load Balancer • We currently run in beta mode – contact me if you want to join
  • 3. Problem Of Data • Flickr just hit 5B pictures • Facebook > 0.5B users • Farmville have more monthly players than the population of France
  • 4. Mondays Key Note • More data • More users • More complex actions • Shorter response times
  • 5. Scalability Pain Infrastructure Cost $ Large You just lost Capital customers Expenditure Predicted Demand Opportunity Traditional Cost Hardware Actual Demand Automated Virtualization time http://media.amazonwebservices.com/pdf/IBMWebinarDeck_Final.pdf
  • 6. CAP vs. ACID • CAP = Consistency, Availability, Partition Tolerance • ACID = Atomicity, Consistency, Isolation, Durability • Atomicity – Chain of actions treated as one whole unseperateable action • Isolation – Consistent query snapshots, read across writes, 4 levels are supported
  • 7. ScaleBase Database Scaling In A Box Applications Legacy clients • The first truly elastic, fault tolerant SQL based data layer • Enables linear scaling Scalebase of any SQL based database Database instances
  • 8. ScaleBase Application/Web Servers Shared Nothing DB Machines Commodity Hardware ScaleBase MySQL? Oracle? Scalable and hi- available
  • 9. THE REQUIREMENTS FOR DATA SLA IN PUBLIC CLOUD ENVIRONMENTS
  • 10. What We Need • Availability • Consistency • Scalability
  • 11. Brewer's (CAP) Theorem • It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: – Consistency (all nodes see the same data at the same time) – Availability (node failures do not prevent survivors from continuing to operate) – Partition Tolerance (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem
  • 13. Reading More On CAP • This is an excellent read, and some of my samples are from this blog – http://www.julianbrowne.com/article/viewer/bre wers-cap-theorem
  • 14. ACHIEVING DATA SCALABILITY WITH RELATIONAL DATABASES
  • 15. Databases And CAP • ACID – Consistency • Availability – tons of solutions, most of them not cloud oriented – Oracle RAC – MySQL Proxy – Etc. – Replication based solutions can solve at least read availability and scalability (see Azure SQL)
  • 16. Database Cloud Solutions • Amazon RDS • NaviSite Oracle RAC • Joyent + Zeus
  • 17. So Where Is The Problem? • Scaling problems (usually write but also read) • Schema change problems • BigData problems
  • 18. Scaling Up • Issues with scaling up when the dataset is just too big • RDBMS were not designed to be distributed • Began to look at multi-node database solutions • Known as ‘scaling out’ or ‘horizontal scaling’ • Different approaches include: – Master-slave – Sharding
  • 19. Scaling RDBMS – Master/Slave • Master-Slave – All writes are written to the master. All reads performed against the replicated slave databases – Critical reads may be incorrect as writes may not have been propagated down – Large data sets can pose problems as master needs to duplicate data to slaves
  • 20. Scaling RDBMS - Sharding • Partition or sharding – Scales well for both reads and writes – Not transparent, application needs to be partition- aware – Can no longer have relationships/joins across partitions – Loss of referential integrity across shards
  • 21. Other ways to scale RDBMS • Multi-Master replication • INSERT only, not UPDATES/DELETES • No JOINs, thereby reducing query time – This involves de-normalizing data • In-memory databases
  • 22. ACHIEVING DATA SLA WITH NOSQL
  • 23. NoSQL • A term used to designate databases which differ from classic relational databases in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term which would include classic relational databases as a subset. http://en.wikipedia.org/wiki/NoSQL
  • 24. NoSQL Types • Key/Value – A big hash table – Examples: Voldemort, Amazon Dynamo • Big Table – Big table, column families – Examples: Hbase, Cassandra • Document based – Collections of collections – Examples: CouchDB, MongoDB • Each solves a different problem
  • 26. Pros/Cons • Pros: – Performance – BigData – Most solutions are open source – Data is replicated to nodes and is therefore fault-tolerant (partitioning) – Don't require a schema – Can scale up and down • Cons: – Code change – Limited framework support – Not ACID – Eco system (BI, Backup) – There is always a database at the backend – Some API is just too simple
  • 27. Amazon S3 Code Sample AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format); Response response = conn.createBucket(bucketName, location, null); final String text = "this is a test"; response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);
  • 28. Cassandra Code Sample CassandraClient cl = pool.getClient() ; KeySpace ks = cl.getKeySpace("Keyspace1") ; // insert value ColumnPath cp = new ColumnPath("Standard1" , null, "testInsertAndGetAndRemove".getBytes("utf-8")); for(int i = 0 ; i < 100 ; i++){ ks.insert("testInsertAndGetAndRemove_"+i, cp , ("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8")); } //get value for(int i = 0 ; i < 100 ; i++){ Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp); String value = new String(col.getValue(),"utf-8") ; } //remove value for(int i = 0 ; i < 100 ; i++){ ks.remove("testInsertAndGetAndRemove_"+i, cp); }
  • 29. Cassandra Code Sample – Cont’ try{ ks.remove("testInsertAndGetAndRemove_not_exist", cp); }catch(Exception e){ fail("remove not exist row should not throw exceptions"); } //get already removed value for(int i = 0 ; i < 100 ; i++){ try{ Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp); fail("the value should already being deleted"); }catch(NotFoundException e){ }catch(Exception e){ fail("throw out other exception, should be NotFoundException." + e.toString() ); } } pool.releaseClient(cl) ; pool.close() ;
  • 30. Cassandra Statistics • Facebook Search • MySQL > 50 GB Data – Writes Average : ~300 ms – Reads Average : ~350 ms • Rewritten with Cassandra > 50 GB Data – Writes Average : 0.12 ms – Reads Average : 15 ms
  • 31. MongoDB Mongo m = new Mongo(); DB db = m.getDB( "mydb" ); Set<String> colls = db.getCollectionNames(); for (String s : colls) { System.out.println(s); }
  • 32. MongoDB – Cont’ BasicDBObject doc = new BasicDBObject(); doc.put("name", "MongoDB"); doc.put("type", "database"); doc.put("count", 1); BasicDBObject info = new BasicDBObject(); info.put("x", 203); info.put("y", 102); doc.put("info", info); coll.insert(doc);
  • 34. Data SLA • There is no golden hammer – See http://sourcemaking.com/antipatterns/golden- hammer • Choose your tool wisely, based on what you need • Usually – Start with RDBMS (shortest TTM, which is what we really care about) – When scale issues occur – start moving to NoSQL based on your needs • You can get Data Scalability in the cloud – just think before you code!!!
  • 35. A BIT MORE ON SCALEBASE
  • 36. How ScaleBase Works Application • ScaleBase takes an application database and splits its data across multiple, separated instances (a technique called Sharding) • Queries and DML are either: – Directed to correct instance, or – Executed simultaneously across ScaleBase several instances • Results are aggregated and returned to the original application Database instances
  • 37. Example ID First name Last name 102 Lex De Haan 105 David Austin ID First name Last name ID First name Last name 100 Steven King 100 Steven King 103 Alexander Hunold 101 Neena Kochhar 106 Valli Pataballa 102 Lex De Haan 103 Alexander Hunold 104 Bruce Ernst ID First name Last name 105 David Austin 101 Neena Kochhar 106 Valli Pataballa 104 Bruce Ernst
  • 38. ScaleBase Supports • 3 table types – Master – Global – Split • Splitting according to Hash, List, Range • Rebalance, addition and removal of machines • Instance replication backup: Shadow and Master • Full consistent 2-Phase Commit • Joins, Foreign Keys, Subqueries • DML, DDL, Batch updates, Prepared Statements • Aggregations, Group By, Order By, Auto Numbering, Timestamps
  • 39. Sample Code SELECT site_owner_id, count(*)FROM google.user_clicks WHERE country = ‘BRAZIL’ GROUP BY site_owner_id • site_owner_id is the split key • Perform the query on all DBs • Simple aggregation of results • No Code Change
  • 40. Sample Code SELECT country, count(*)FROM google.user_clicks GROUP BY country • Perform the query on all DBs • Aggregation of the aggregations • No Code Change
  • 41. Sample Code PreparedStatement pstmt = conn.prepareStatement("INSERT INTO emp VALUES(?,?,?,?,?)"); for (int i = 0; i < 10; i++) { pstmt.setInt(1, 300 + i); pstmt.setString(2, "Something" + i); pstmt.setDate(3, new Date(System.currentTimeMillis())); pstmt.setInt(4, i); pstmt.setLong(5, i); pstmt.addBatch(); } int[] result = pstmt.executeBatch(); • Is split key dynamic or static? • Each command is added to the correct DB, execution is on all relevant DBs • No Code Change
  • 42. ScaleBase Solution • Elastic SQL Database Scaling hi-availability solution – Complete – Transparent – Super scalable – Out of the box – Non-intrusive – Flexible – Manageable
  • 43. With ScaleBase • Pay much less for hardware and Database licenses • Get more power, better data spreading and better availability • Real linear scalability • Go for real grid, cloud and virtualization • ScaleBase is NOT: – Is NOT an RDBMS. It facilitates any secure, high-available, archivable RDBMS (Oracle, DB2, MySQL, any!) – Does NOT require schema structure modifications – Does NOT require additional sophisticated hardware
  • 44. Moving To ScaleBase • Implementing ScaleBase is done in minutes • Just direct your application to your ScaleBase instance • Target ScaleBase to your original database and target database instances • ScaleBase will automatically migrate your schema and data to the new servers • After all data is transferred ScaleBase will start working with target database instances, giving you linear scalability – with no down time!
  • 45. Where ScaleBase Fits In • Cloud databases – Use SQL databases in the cloud, and get Scale Out features and high availability • High scale applications – Use your existing code, without migrating to NOSQL, and get unlimited SQL scalability
  • 47. Public Cloud • Scenario: – A startup developing a complex iPhone application running on EC2 – Requires high scaling option for SQL based database • Problem: – Amazon RDS offers only Scale Up option, no Scale Out • Solution: – Customer switched their Amazon RDS-based application to ScaleBase on EC2 – Gained transparent, linear, scaling – Running 4 RDS instances behind ScaleBase
  • 48. Private Cloud • Scenario: – A company selling devices that ‘ping’ home every 5 minute – 8 digit number of devices sold • Problem: – Evaluated MySQL, Oracle - no single machine can support required devices – Clustering options too expensive, limited scalability • Solution: – Customer moved to ScaleBase with no code changes – Gained linear scales. Runs 4 MySQL databases behind ScaleBase