SlideShare uma empresa Scribd logo
1 de 44
Data Modeling and Scale Out
Jason Stamper, 451 Research
Vladi Vexler and Paul Campaniello, ScaleBase
2
Agenda
Data Modeling and Scale Out
1. 451 Research
• Key challenges in the data landscape
• Evolution of distributed database environments
2. ScaleBase
• Pros and cons of abstracting complex databases topology
• Top strategies of distributed data modeling
• Advanced data modeling and “what-if” simulations with Analysis Genie
• Scaling real apps – From need to deployment
• Demo
3. Q & A (please type questions directly into the GoToWebinar side panel)
3
Today’s Presenters
Jason Stamper
Analyst, Data Manage-
ment and Analytics
- 451 Research
• Over 20 years of
experience in IT
• Formerly Editor
of Computer Business
Review & Technology
Editor at The New
Statesman
Vladi Vexler
Vice President, Tech.
& Product Marketing
- ScaleBase
• Over 15 years experience
in software development
and product management
• Author of patents in field
of databases innovation,
dynamic data caching and
machine learning analytics
Paul Campaniello
Vice President,
Worldwide Marketing
- ScaleBase
• Over 25 years of software
marketing & sales
experience
• Held senior marketing
and sales positions at
Mendix, Lumigent, ESI,
ComBrio, Savantis and
Precise Software
4
About 451 Research
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service
providers, corporate
advisory, finance, professional services, and IT
decision makers
10,000+ senior IT professionals in our research
community
Over 52 million data points each quarter
Headquartered in New York with offices in
Boston, San Francisco, Washington, London…
Research & Data
Advisory Services
Events
5
The Challenge
Businesses and their users are facing what one might call a
perfect storm – decision-makers need insight faster than ever,
and yet IT is struggling to avoid becoming a bottleneck.
6
The Facts Speak for Themselves…
Recent survey by trade magazine Computer Business
Review: 98% (of 200 UK CIOs) admit “significant gap”
between what business expects and what IT can deliver.
7
So What Does the Business Want?
Speed
Information, not
data
Flexibility
Ease-of-use
Mobility
New ways of
working
Self-service
Scale
Collaboration
8
What Causes IT to Become a Bottleneck?
Governance
Control
Security
Budget
Legacy
Staff
9
What Have We Learned So Far?
• So far, the emergence of so-called ‘hot’ data platform and
analytics technologies have not solved the IT information
bottleneck.
• Hadoop isn’t going to save the world (and neither is
NoSQL).
• The ability to analyze large data sets, in real- or near
real-time, is only set to grow in the era of the Internet of
Things.
• IT is still critical, but it needs to enable the business to
help itself. The question is how to achieve the right blend
of usability, value-for-money and scalability.
10
A Word or Two on Hadoop Adoption
0 2000 4000 6000 8000
2013
2012DW and DBMS
Unstructured file
Virtualized server/OS
Backup
Archive
Other
Big data/Hadoop
Average total storage capacity (TBs), and total storage footprint
by workload illustrate the low level of adoption today
11
451 Research’s View of the ‘Total Data Approach’
12
What is Driving the Change?
Developers
Agile
REST
JSON
Schemaless
Schema-on-read
Flexible
Applications
Web
Social
Mobile
Always-on
Interactive
Local
Architecture
Cloud
Scalable
Elastic
Virtual
Distributed
Flexible
New applications require
distributed architecture
Distributed architecture
encourages new
development
approaches
New development approaches
demand new architecture
Distributed architecture
enables new applications
New app
requirements
demand new
development
approaches
New dev
approaches
enable new
lightweight
apps
13
The Database Challenge
– The traditional relational database has been stretched beyond its
normal capacity limits by the needs of high-volume, highly
distributed or highly complex applications.
– There are workarounds – such as DIY sharding – but manual,
homegrown efforts can result in database administrators being
stretched beyond their available capacity in terms of managing
complexity.
– Scalability
– Performance
– Relaxed consistency Increased willingness to look
– Agility for emerging alternatives
– Intricacy
– Necessity
14
Scalability, and Other Challenges
• As usage of MySQL and MariaDB has grown, so has the usage
of applications that depend on MySQL and MariaDB:
– Games; Social; Customer Facing; Web; Business apps like Ad Networks;
• This has highlighted a number of challenges
– Scalability of master-slave architecture
– Performance and predictability at scale
– Lower latency; greater throughput; richer apps
– User expectations rising
– Manageability of increasing database/app sprawl
• External factors driving greater complexity:
– Distributed computing architectures
– Proliferation of cloud and elasticity requirements
– Geo-distributed application requirements
– Viral success means growth can come very quickly
15
Conclusions
• The success of MySQL and MariaDB has led to complications
in terms of scalability concerns
• Distributed computing, proliferation of cloud, and geo-
distributed applications are adding to the complexity
• Manual sharding techniques transfer the strain from the
database to the database administrator
• MySQL – and MySQL administrators – has/have never been
under so much strain
• Database scalability software enables users to move beyond
the limitations and complexity of DIY sharding; precisely how
data is managed with a distributed database in the cloud or on
premise is key.
Scale Out Designs
17
About ScaleBase
Distributed Database Management System
Architected for the Cloud
Simple. Reliable. Powerful.
18
Quick Scale Out
Medium scale needs
Multiple database
replicas performing load
balancing with
read/write splitting
Designs of Distributed MySQL Environments
Massive Scale Out
High scale needs
Complete distributed
database environment,
with policy-based data
sharding/distribution
19
Quick Scale-Out
Read/Write Splitting and
Continuous Availability
Application
Redirection
(ip/port)
MySQL Replicas
MySQL Master
R R R
R/W
20
Massive Scale-Out
0 1 2
etc.
Master
Replicas
Master
Replicas
Master
Replicas
Shards:
21
The Right Solution for You Depends on Your Goals
• Scale (mostly) reads
• Scale (mostly) writes
• Performance of reads
– Affected by joins and big tables scans of big tables
• Performance of writes
– Affected by IO r/wr, CPU and table indexes
(a growing overhead)
• Locks
• CPU/IO/ RAM issues
• Load peaks
• Data growth
• Geo-distribution, special data distribution needs
Pros and Cons of
Abstracting Complex Database Topology
23
Pros of Abstracting Complex Database Topology
• Development Agility - Accelerates
your innovation speed
• Simplifies application code
• Reduces maintenance costs and
simplifies it
• Operations Efficiency – Zero
downtime for applications
• Reduces operation costs
• Better monitoring, analytics, HA,
scale, elasticity, etc.
24
Cons of Abstracting Complex Database Topology
• Additional technology component may increase complexity
• Additional layer to monitor and manage
• Additional machines to monitor and manage (possible increased opex)
• Less control on application code level (transparent)
25
Scale Out
Methodologies
Comparison
Characteristics & Modeling in a
Distributed Database System
27
Characteristics of Distributed Table Types
• MASTER – On master shard (0) only
Site settings, Admin data tables
• GLOBAL – Full copy on all shards
Lookups, Frequently joined tables, Slow growing tables
• DISTRIBUTED-ROOT – Distribution based on a key column
User.Id
• DISTRIBUTED-CASCADED (child) – Based on parent row
User_Photos, User_Photos_Likes – depend on Users
Shards: 0 1 2 3
Full table
Full table Full table Full table Full table
¼ table ¼ table ¼ table ¼ table
28
Characteristics of Distributed Queries
• ONE-DB – 1 shard, 1 node. Most optimal.
1) Any call when data known to be in one shard (Distributed/Master)
2) Call to Global table (load balance)
• ALL-DB – All shards, 1 node.
1) AGREGATED READs (like map-reduce)
2) DML (writes) on Global tables
3) DDL (create, drop, alter schema)
• FULL-DB – All shards, all nodes.
Session calls (USE, SET)
• CROSS-DB – #n shards, 1 node. Least optimal, but critical
Cross-shard conflict resolution.
Note: Not all sharding platforms support all distributed query types.
29
Why Data Modeling is Important?
• DATA and LOAD – Efficient distribution of:
– DATA - all / main tables and data
– READS
– WRITES
• QUERIES
– Handle ALL-DB Queries (Map-reduce concept)
– Minimize (but support!) CROSS-DB Queries – higher performance and scale
• OPTIMIZE DEVELOPMENT with SQL ANALYTICS
– Insight into the real database usage
30
Data Relationships Can be Extremely Complex
Usually, scale out is applied to growing-mature apps.
How do you define an optimal data distribution policy?
Analysis Genie:
MySQL Visual Analysis &
Optimal Distribution Policy Configuration
32
ScaleBase Analysis Genie
• A tool enabling MySQL visual analysis and building an optimal data
distribution policy; Designed for DBAs, Architects & Dev. Managers
• Two step-process:
– Analysis Assistant
– An agent captures app/DB information, including SQL traffic and
database metrics
– Obfuscates, summarizes and packages the App-DB data
– Analysis Genie
– a SaaS application, receives the AA package and presents the
visual analysis and details the policy configuration
Analysis Assistant Analysis Genie
33
ScaleBase Analysis Genie
• Advanced analytics
– Schemas, data & queries
– Semantic structure analysis
– Usage, Load and Scale analytics
• Data Modeling and
Scale-out planning
– Customized for the most complex
applications
– Auto identification of optimal
data distribution policy
– Complete policy control
• Quality assurance
– Review before production
• Simulation of results
– “What-if” analysis
34
Relationship Identification
Mapping includes:
• Schemas structures
• Tables & columns names
matching
• Queries parsing and
identification of joined
tables and columns
• Statistics on every object
size and access
35
Analyzing Relationships: From Chaos to Order
Understanding
and mapping
complex
relationships
ScaleBase Genie Demo
37
MySQL Visual Analysis Demo
• Visual analysis
• Distribution policy identification and configuration
• Scale out load via data sharding (massive scale out)
ScaleBase Enterprise
Analysis
Genie
Summary
39
Reading Plus
Who:
• Online education company
Problem:
• Busy season (back-to-school) was approaching and they needed a solution
that could be quickly implemented, while guaranteeing uptime
• With increasing growth, they needed to implement a scale out solution quickly
Alternatives Considered:
• A clustering technology, which proved to be infeasible due to schema
complexity and a lengthy re-architecture requirement
Solution:
• Used visual analysis to determine best scale out plan
• ScaleBase Lite for instant scale out and continuous availability
• 35 Tomcat application servers were connected to 3 ScaleBase controllers
• ScaleBase performed automated read/write splitting and load balancing
40
Next Gen SaaS ERP Company
Who:
• Inventory management
ecommerce company
• Hosted on Rackspace
(ScaleBase Partner)
Problem:
• Largest available hardware could not support workload
Alternatives Considered:
• Initially went with a “black box” solution, encountering many issues
Solution:
• Scaled out a single MySQL instance to 8 clustered shards
• On-demand growth – current workload over 20,000 TPS
– Plan to double footprint in next quarter
– Support all production customers during Black Friday & Cyber Monday
41
Scale out to unlimited users
Continuous availability
Dynamic workload optimization
Fast and simple deployment
Easily scale out a single
MySQL instance
Optimized for the Cloud
Reduces time-to-market
No changes needed to app or database
Database usage analytics
Intelligent load balancing
Centralized data management
ScaleBase
Distributed Database Management System
42
Products and Editions
Community
Limited by
Deployment
Startup
Free for Qualified
Candidates
Enterprise
Massive
Scale Out
Also available on:
Lite
Quick
Scale Out
Analysis Genie Database Performance Analytics
43
How Can I Learn More?
Use visual analysis to plan your
scale out strategy
Download the
Analysis Genie:
https://www.scalebase.com/software
Read the 451 report about
ScaleBase (& the DB market)
Download Jason’s Report
(authored last week)
https://www.scalebase.com/resources/
whitepapers
Questions?
Contact Info:
Paul Campaniello
paul.campaniello@scalebase.com
Vladi Vexler
vladi.vexler@scalebase.com
Resources:
www.scalebase.com
www.scalebase.com/resources
www.scalebase.com/blog
info@scalebase.com
(617) 630.2800

Mais conteúdo relacionado

Mais procurados

The final frontier
The final frontierThe final frontier
The final frontier
Terry Bunio
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
guest4e975e2
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
HPDutchWorld
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
shuwutong
 

Mais procurados (20)

Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Agile NoSQL With XRX
Agile NoSQL With XRXAgile NoSQL With XRX
Agile NoSQL With XRX
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 
Deliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhereDeliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhere
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value Store
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Enterprise Solutions Architect Eli Perl CV
Enterprise Solutions Architect Eli Perl CVEnterprise Solutions Architect Eli Perl CV
Enterprise Solutions Architect Eli Perl CV
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse Methodology
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 

Destaque

Mi albun fotografico lili
Mi albun fotografico liliMi albun fotografico lili
Mi albun fotografico lili
ALEXANDRA
 
Advertising presentation 2
Advertising presentation 2Advertising presentation 2
Advertising presentation 2
sadieolen
 
Au Psy492 M7 A3 E Portf Mead L
Au Psy492 M7 A3 E Portf Mead LAu Psy492 M7 A3 E Portf Mead L
Au Psy492 M7 A3 E Portf Mead L
Lisamead
 
Bonding singapore
Bonding singaporeBonding singapore
Bonding singapore
limmervin24
 
Acc 626 slidecast
Acc 626 slidecastAcc 626 slidecast
Acc 626 slidecast
j9lai
 
Atom Movie Notes Day 2
Atom Movie Notes Day 2Atom Movie Notes Day 2
Atom Movie Notes Day 2
jmori1
 
Publications, Book Chapters, And Selected Patents
Publications, Book Chapters, And Selected PatentsPublications, Book Chapters, And Selected Patents
Publications, Book Chapters, And Selected Patents
azilberstein
 

Destaque (20)

2014 Android and iOS Design Trends
2014 Android and iOS Design Trends2014 Android and iOS Design Trends
2014 Android and iOS Design Trends
 
Wayne discusses Corporate Social Responsibility and Corporate Strategy at a C...
Wayne discusses Corporate Social Responsibility and Corporate Strategy at a C...Wayne discusses Corporate Social Responsibility and Corporate Strategy at a C...
Wayne discusses Corporate Social Responsibility and Corporate Strategy at a C...
 
Mi albun fotografico lili
Mi albun fotografico liliMi albun fotografico lili
Mi albun fotografico lili
 
Advertising presentation 2
Advertising presentation 2Advertising presentation 2
Advertising presentation 2
 
Ecorys
Ecorys Ecorys
Ecorys
 
Bs work
Bs workBs work
Bs work
 
Q4 christmas
Q4 christmasQ4 christmas
Q4 christmas
 
IM Club: Do You Trust Social Media Automation?
IM Club: Do You Trust Social Media Automation?IM Club: Do You Trust Social Media Automation?
IM Club: Do You Trust Social Media Automation?
 
Au Psy492 M7 A3 E Portf Mead L
Au Psy492 M7 A3 E Portf Mead LAu Psy492 M7 A3 E Portf Mead L
Au Psy492 M7 A3 E Portf Mead L
 
Bonding singapore
Bonding singaporeBonding singapore
Bonding singapore
 
Chuong 4 thach thuc tham hut thuong mai
Chuong 4   thach thuc tham hut thuong maiChuong 4   thach thuc tham hut thuong mai
Chuong 4 thach thuc tham hut thuong mai
 
Простаблисс
ПростаблиссПростаблисс
Простаблисс
 
00 ส่วนนำ1
00 ส่วนนำ100 ส่วนนำ1
00 ส่วนนำ1
 
Acc 626 slidecast
Acc 626 slidecastAcc 626 slidecast
Acc 626 slidecast
 
Atom Movie Notes Day 2
Atom Movie Notes Day 2Atom Movie Notes Day 2
Atom Movie Notes Day 2
 
Relazione affidabilità
Relazione affidabilitàRelazione affidabilità
Relazione affidabilità
 
Presentasi musihoven
Presentasi musihovenPresentasi musihoven
Presentasi musihoven
 
Publications, Book Chapters, And Selected Patents
Publications, Book Chapters, And Selected PatentsPublications, Book Chapters, And Selected Patents
Publications, Book Chapters, And Selected Patents
 
Denk Modulair, Denk Lego
Denk Modulair, Denk LegoDenk Modulair, Denk Lego
Denk Modulair, Denk Lego
 
Presentazione inglese
Presentazione inglese Presentazione inglese
Presentazione inglese
 

Semelhante a Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
DATAVERSITY
 

Semelhante a Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 (20)

MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Phases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ NokiaPhases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ Nokia
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 

Mais de Vladi Vexler

SafePeak Configuration Guide
SafePeak Configuration GuideSafePeak Configuration Guide
SafePeak Configuration Guide
Vladi Vexler
 
Safe peak installation guide version 2.1
Safe peak installation guide version 2.1Safe peak installation guide version 2.1
Safe peak installation guide version 2.1
Vladi Vexler
 
SafePeak cloud case study:EEDAR
SafePeak cloud case study:EEDAR SafePeak cloud case study:EEDAR
SafePeak cloud case study:EEDAR
Vladi Vexler
 
SafePeak Installation guide
SafePeak Installation guideSafePeak Installation guide
SafePeak Installation guide
Vladi Vexler
 
SafePeak Globes testimonial
SafePeak Globes testimonialSafePeak Globes testimonial
SafePeak Globes testimonial
Vladi Vexler
 
SafePeak - Poria hospital case study
SafePeak - Poria hospital case studySafePeak - Poria hospital case study
SafePeak - Poria hospital case study
Vladi Vexler
 
SafePeak @ large telco - Sharepoint benchmark
SafePeak @ large telco - Sharepoint benchmarkSafePeak @ large telco - Sharepoint benchmark
SafePeak @ large telco - Sharepoint benchmark
Vladi Vexler
 
SafePeak datasheet 2010
SafePeak datasheet 2010SafePeak datasheet 2010
SafePeak datasheet 2010
Vladi Vexler
 

Mais de Vladi Vexler (15)

Data Caching Evolution - the SafePeak deck from webcast 2014-04-24
Data Caching Evolution - the SafePeak deck from webcast 2014-04-24Data Caching Evolution - the SafePeak deck from webcast 2014-04-24
Data Caching Evolution - the SafePeak deck from webcast 2014-04-24
 
SafePeak - IT particle accelerator (2012)
SafePeak - IT particle accelerator (2012)SafePeak - IT particle accelerator (2012)
SafePeak - IT particle accelerator (2012)
 
SafePeak - In-Memory Dynamic Caching
SafePeak - In-Memory Dynamic CachingSafePeak - In-Memory Dynamic Caching
SafePeak - In-Memory Dynamic Caching
 
SafePeak Configuration Guide
SafePeak Configuration GuideSafePeak Configuration Guide
SafePeak Configuration Guide
 
SafePeak - How to manually configure SafePeak Cluster
SafePeak - How to manually configure SafePeak ClusterSafePeak - How to manually configure SafePeak Cluster
SafePeak - How to manually configure SafePeak Cluster
 
Safe peak installation guide version 2.1
Safe peak installation guide version 2.1Safe peak installation guide version 2.1
Safe peak installation guide version 2.1
 
SafePeak - How to configure SQL Server agent in a safepeak deployment
SafePeak - How to configure SQL Server agent in a safepeak deploymentSafePeak - How to configure SQL Server agent in a safepeak deployment
SafePeak - How to configure SQL Server agent in a safepeak deployment
 
SafePeak cloud case study:EEDAR
SafePeak cloud case study:EEDAR SafePeak cloud case study:EEDAR
SafePeak cloud case study:EEDAR
 
SafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud AppsSafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud Apps
 
SafePeak Installation guide
SafePeak Installation guideSafePeak Installation guide
SafePeak Installation guide
 
SafePeak Globes testimonial
SafePeak Globes testimonialSafePeak Globes testimonial
SafePeak Globes testimonial
 
SafePeak - Poria hospital case study
SafePeak - Poria hospital case studySafePeak - Poria hospital case study
SafePeak - Poria hospital case study
 
SafePeak @ large telco - Sharepoint benchmark
SafePeak @ large telco - Sharepoint benchmarkSafePeak @ large telco - Sharepoint benchmark
SafePeak @ large telco - Sharepoint benchmark
 
SafePeak datasheet 2010
SafePeak datasheet 2010SafePeak datasheet 2010
SafePeak datasheet 2010
 
SafePeak whitepaper
SafePeak whitepaperSafePeak whitepaper
SafePeak whitepaper
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

  • 1. Data Modeling and Scale Out Jason Stamper, 451 Research Vladi Vexler and Paul Campaniello, ScaleBase
  • 2. 2 Agenda Data Modeling and Scale Out 1. 451 Research • Key challenges in the data landscape • Evolution of distributed database environments 2. ScaleBase • Pros and cons of abstracting complex databases topology • Top strategies of distributed data modeling • Advanced data modeling and “what-if” simulations with Analysis Genie • Scaling real apps – From need to deployment • Demo 3. Q & A (please type questions directly into the GoToWebinar side panel)
  • 3. 3 Today’s Presenters Jason Stamper Analyst, Data Manage- ment and Analytics - 451 Research • Over 20 years of experience in IT • Formerly Editor of Computer Business Review & Technology Editor at The New Statesman Vladi Vexler Vice President, Tech. & Product Marketing - ScaleBase • Over 15 years experience in software development and product management • Author of patents in field of databases innovation, dynamic data caching and machine learning analytics Paul Campaniello Vice President, Worldwide Marketing - ScaleBase • Over 25 years of software marketing & sales experience • Held senior marketing and sales positions at Mendix, Lumigent, ESI, ComBrio, Savantis and Precise Software
  • 4. 4 About 451 Research Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 10,000+ senior IT professionals in our research community Over 52 million data points each quarter Headquartered in New York with offices in Boston, San Francisco, Washington, London… Research & Data Advisory Services Events
  • 5. 5 The Challenge Businesses and their users are facing what one might call a perfect storm – decision-makers need insight faster than ever, and yet IT is struggling to avoid becoming a bottleneck.
  • 6. 6 The Facts Speak for Themselves… Recent survey by trade magazine Computer Business Review: 98% (of 200 UK CIOs) admit “significant gap” between what business expects and what IT can deliver.
  • 7. 7 So What Does the Business Want? Speed Information, not data Flexibility Ease-of-use Mobility New ways of working Self-service Scale Collaboration
  • 8. 8 What Causes IT to Become a Bottleneck? Governance Control Security Budget Legacy Staff
  • 9. 9 What Have We Learned So Far? • So far, the emergence of so-called ‘hot’ data platform and analytics technologies have not solved the IT information bottleneck. • Hadoop isn’t going to save the world (and neither is NoSQL). • The ability to analyze large data sets, in real- or near real-time, is only set to grow in the era of the Internet of Things. • IT is still critical, but it needs to enable the business to help itself. The question is how to achieve the right blend of usability, value-for-money and scalability.
  • 10. 10 A Word or Two on Hadoop Adoption 0 2000 4000 6000 8000 2013 2012DW and DBMS Unstructured file Virtualized server/OS Backup Archive Other Big data/Hadoop Average total storage capacity (TBs), and total storage footprint by workload illustrate the low level of adoption today
  • 11. 11 451 Research’s View of the ‘Total Data Approach’
  • 12. 12 What is Driving the Change? Developers Agile REST JSON Schemaless Schema-on-read Flexible Applications Web Social Mobile Always-on Interactive Local Architecture Cloud Scalable Elastic Virtual Distributed Flexible New applications require distributed architecture Distributed architecture encourages new development approaches New development approaches demand new architecture Distributed architecture enables new applications New app requirements demand new development approaches New dev approaches enable new lightweight apps
  • 13. 13 The Database Challenge – The traditional relational database has been stretched beyond its normal capacity limits by the needs of high-volume, highly distributed or highly complex applications. – There are workarounds – such as DIY sharding – but manual, homegrown efforts can result in database administrators being stretched beyond their available capacity in terms of managing complexity. – Scalability – Performance – Relaxed consistency Increased willingness to look – Agility for emerging alternatives – Intricacy – Necessity
  • 14. 14 Scalability, and Other Challenges • As usage of MySQL and MariaDB has grown, so has the usage of applications that depend on MySQL and MariaDB: – Games; Social; Customer Facing; Web; Business apps like Ad Networks; • This has highlighted a number of challenges – Scalability of master-slave architecture – Performance and predictability at scale – Lower latency; greater throughput; richer apps – User expectations rising – Manageability of increasing database/app sprawl • External factors driving greater complexity: – Distributed computing architectures – Proliferation of cloud and elasticity requirements – Geo-distributed application requirements – Viral success means growth can come very quickly
  • 15. 15 Conclusions • The success of MySQL and MariaDB has led to complications in terms of scalability concerns • Distributed computing, proliferation of cloud, and geo- distributed applications are adding to the complexity • Manual sharding techniques transfer the strain from the database to the database administrator • MySQL – and MySQL administrators – has/have never been under so much strain • Database scalability software enables users to move beyond the limitations and complexity of DIY sharding; precisely how data is managed with a distributed database in the cloud or on premise is key.
  • 17. 17 About ScaleBase Distributed Database Management System Architected for the Cloud Simple. Reliable. Powerful.
  • 18. 18 Quick Scale Out Medium scale needs Multiple database replicas performing load balancing with read/write splitting Designs of Distributed MySQL Environments Massive Scale Out High scale needs Complete distributed database environment, with policy-based data sharding/distribution
  • 19. 19 Quick Scale-Out Read/Write Splitting and Continuous Availability Application Redirection (ip/port) MySQL Replicas MySQL Master R R R R/W
  • 20. 20 Massive Scale-Out 0 1 2 etc. Master Replicas Master Replicas Master Replicas Shards:
  • 21. 21 The Right Solution for You Depends on Your Goals • Scale (mostly) reads • Scale (mostly) writes • Performance of reads – Affected by joins and big tables scans of big tables • Performance of writes – Affected by IO r/wr, CPU and table indexes (a growing overhead) • Locks • CPU/IO/ RAM issues • Load peaks • Data growth • Geo-distribution, special data distribution needs
  • 22. Pros and Cons of Abstracting Complex Database Topology
  • 23. 23 Pros of Abstracting Complex Database Topology • Development Agility - Accelerates your innovation speed • Simplifies application code • Reduces maintenance costs and simplifies it • Operations Efficiency – Zero downtime for applications • Reduces operation costs • Better monitoring, analytics, HA, scale, elasticity, etc.
  • 24. 24 Cons of Abstracting Complex Database Topology • Additional technology component may increase complexity • Additional layer to monitor and manage • Additional machines to monitor and manage (possible increased opex) • Less control on application code level (transparent)
  • 26. Characteristics & Modeling in a Distributed Database System
  • 27. 27 Characteristics of Distributed Table Types • MASTER – On master shard (0) only Site settings, Admin data tables • GLOBAL – Full copy on all shards Lookups, Frequently joined tables, Slow growing tables • DISTRIBUTED-ROOT – Distribution based on a key column User.Id • DISTRIBUTED-CASCADED (child) – Based on parent row User_Photos, User_Photos_Likes – depend on Users Shards: 0 1 2 3 Full table Full table Full table Full table Full table ¼ table ¼ table ¼ table ¼ table
  • 28. 28 Characteristics of Distributed Queries • ONE-DB – 1 shard, 1 node. Most optimal. 1) Any call when data known to be in one shard (Distributed/Master) 2) Call to Global table (load balance) • ALL-DB – All shards, 1 node. 1) AGREGATED READs (like map-reduce) 2) DML (writes) on Global tables 3) DDL (create, drop, alter schema) • FULL-DB – All shards, all nodes. Session calls (USE, SET) • CROSS-DB – #n shards, 1 node. Least optimal, but critical Cross-shard conflict resolution. Note: Not all sharding platforms support all distributed query types.
  • 29. 29 Why Data Modeling is Important? • DATA and LOAD – Efficient distribution of: – DATA - all / main tables and data – READS – WRITES • QUERIES – Handle ALL-DB Queries (Map-reduce concept) – Minimize (but support!) CROSS-DB Queries – higher performance and scale • OPTIMIZE DEVELOPMENT with SQL ANALYTICS – Insight into the real database usage
  • 30. 30 Data Relationships Can be Extremely Complex Usually, scale out is applied to growing-mature apps. How do you define an optimal data distribution policy?
  • 31. Analysis Genie: MySQL Visual Analysis & Optimal Distribution Policy Configuration
  • 32. 32 ScaleBase Analysis Genie • A tool enabling MySQL visual analysis and building an optimal data distribution policy; Designed for DBAs, Architects & Dev. Managers • Two step-process: – Analysis Assistant – An agent captures app/DB information, including SQL traffic and database metrics – Obfuscates, summarizes and packages the App-DB data – Analysis Genie – a SaaS application, receives the AA package and presents the visual analysis and details the policy configuration Analysis Assistant Analysis Genie
  • 33. 33 ScaleBase Analysis Genie • Advanced analytics – Schemas, data & queries – Semantic structure analysis – Usage, Load and Scale analytics • Data Modeling and Scale-out planning – Customized for the most complex applications – Auto identification of optimal data distribution policy – Complete policy control • Quality assurance – Review before production • Simulation of results – “What-if” analysis
  • 34. 34 Relationship Identification Mapping includes: • Schemas structures • Tables & columns names matching • Queries parsing and identification of joined tables and columns • Statistics on every object size and access
  • 35. 35 Analyzing Relationships: From Chaos to Order Understanding and mapping complex relationships
  • 37. 37 MySQL Visual Analysis Demo • Visual analysis • Distribution policy identification and configuration • Scale out load via data sharding (massive scale out) ScaleBase Enterprise Analysis Genie
  • 39. 39 Reading Plus Who: • Online education company Problem: • Busy season (back-to-school) was approaching and they needed a solution that could be quickly implemented, while guaranteeing uptime • With increasing growth, they needed to implement a scale out solution quickly Alternatives Considered: • A clustering technology, which proved to be infeasible due to schema complexity and a lengthy re-architecture requirement Solution: • Used visual analysis to determine best scale out plan • ScaleBase Lite for instant scale out and continuous availability • 35 Tomcat application servers were connected to 3 ScaleBase controllers • ScaleBase performed automated read/write splitting and load balancing
  • 40. 40 Next Gen SaaS ERP Company Who: • Inventory management ecommerce company • Hosted on Rackspace (ScaleBase Partner) Problem: • Largest available hardware could not support workload Alternatives Considered: • Initially went with a “black box” solution, encountering many issues Solution: • Scaled out a single MySQL instance to 8 clustered shards • On-demand growth – current workload over 20,000 TPS – Plan to double footprint in next quarter – Support all production customers during Black Friday & Cyber Monday
  • 41. 41 Scale out to unlimited users Continuous availability Dynamic workload optimization Fast and simple deployment Easily scale out a single MySQL instance Optimized for the Cloud Reduces time-to-market No changes needed to app or database Database usage analytics Intelligent load balancing Centralized data management ScaleBase Distributed Database Management System
  • 42. 42 Products and Editions Community Limited by Deployment Startup Free for Qualified Candidates Enterprise Massive Scale Out Also available on: Lite Quick Scale Out Analysis Genie Database Performance Analytics
  • 43. 43 How Can I Learn More? Use visual analysis to plan your scale out strategy Download the Analysis Genie: https://www.scalebase.com/software Read the 451 report about ScaleBase (& the DB market) Download Jason’s Report (authored last week) https://www.scalebase.com/resources/ whitepapers
  • 44. Questions? Contact Info: Paul Campaniello paul.campaniello@scalebase.com Vladi Vexler vladi.vexler@scalebase.com Resources: www.scalebase.com www.scalebase.com/resources www.scalebase.com/blog info@scalebase.com (617) 630.2800

Notas do Editor

  1. Here is a summary of different approaches. More detailed description can be found on our website, under Resources -> Competitive Comparison Explain the circles, We are the only one for example that provide Advanced Analytics, which is the foundation for defining optimal distribution policy. ScaleBase solution is the most simple to deploy, enabling shortest go-to-market and lowest maintenance
  2. One of first steps is to Visually Analyze complete summary about state of your MySQL tables: - Physical and Logical Sizes, Writes, Reads, Joins
  3. Determine optimal distribution policy for your specific application and database Analyze your existing schema and queries What is the current structure of your data How is your data accessed by the applications What is the size and rate of writes to individual tables
  4. Determine optimal distribution policy for your specific application and database Analyze your existing schema and queries What is the current structure of your data How is your data accessed by the applications What is the size and rate of writes to individual tables
  5. Risk Cost savings (ROI) Time to market Building solution takes years Open source is limited Not comprehensive Lack of technical support and services Custom built Inefficient and hard to maintain
  6. Risk Cost savings (ROI) Time to market Building solution takes years Open source is limited Not comprehensive Lack of technical support and services Custom built Inefficient and hard to maintain