Thank you for the summary. I appreciate you highlighting the key points about data modeling challenges, distributed database approaches, and how ScaleBase's products address these issues through visual analysis and optimal data distribution configuration.
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
1. Data Modeling and Scale Out
Jason Stamper, 451 Research
Vladi Vexler and Paul Campaniello, ScaleBase
2. 2
Agenda
Data Modeling and Scale Out
1. 451 Research
• Key challenges in the data landscape
• Evolution of distributed database environments
2. ScaleBase
• Pros and cons of abstracting complex databases topology
• Top strategies of distributed data modeling
• Advanced data modeling and “what-if” simulations with Analysis Genie
• Scaling real apps – From need to deployment
• Demo
3. Q & A (please type questions directly into the GoToWebinar side panel)
3. 3
Today’s Presenters
Jason Stamper
Analyst, Data Manage-
ment and Analytics
- 451 Research
• Over 20 years of
experience in IT
• Formerly Editor
of Computer Business
Review & Technology
Editor at The New
Statesman
Vladi Vexler
Vice President, Tech.
& Product Marketing
- ScaleBase
• Over 15 years experience
in software development
and product management
• Author of patents in field
of databases innovation,
dynamic data caching and
machine learning analytics
Paul Campaniello
Vice President,
Worldwide Marketing
- ScaleBase
• Over 25 years of software
marketing & sales
experience
• Held senior marketing
and sales positions at
Mendix, Lumigent, ESI,
ComBrio, Savantis and
Precise Software
4. 4
About 451 Research
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service
providers, corporate
advisory, finance, professional services, and IT
decision makers
10,000+ senior IT professionals in our research
community
Over 52 million data points each quarter
Headquartered in New York with offices in
Boston, San Francisco, Washington, London…
Research & Data
Advisory Services
Events
5. 5
The Challenge
Businesses and their users are facing what one might call a
perfect storm – decision-makers need insight faster than ever,
and yet IT is struggling to avoid becoming a bottleneck.
6. 6
The Facts Speak for Themselves…
Recent survey by trade magazine Computer Business
Review: 98% (of 200 UK CIOs) admit “significant gap”
between what business expects and what IT can deliver.
7. 7
So What Does the Business Want?
Speed
Information, not
data
Flexibility
Ease-of-use
Mobility
New ways of
working
Self-service
Scale
Collaboration
8. 8
What Causes IT to Become a Bottleneck?
Governance
Control
Security
Budget
Legacy
Staff
9. 9
What Have We Learned So Far?
• So far, the emergence of so-called ‘hot’ data platform and
analytics technologies have not solved the IT information
bottleneck.
• Hadoop isn’t going to save the world (and neither is
NoSQL).
• The ability to analyze large data sets, in real- or near
real-time, is only set to grow in the era of the Internet of
Things.
• IT is still critical, but it needs to enable the business to
help itself. The question is how to achieve the right blend
of usability, value-for-money and scalability.
10. 10
A Word or Two on Hadoop Adoption
0 2000 4000 6000 8000
2013
2012DW and DBMS
Unstructured file
Virtualized server/OS
Backup
Archive
Other
Big data/Hadoop
Average total storage capacity (TBs), and total storage footprint
by workload illustrate the low level of adoption today
12. 12
What is Driving the Change?
Developers
Agile
REST
JSON
Schemaless
Schema-on-read
Flexible
Applications
Web
Social
Mobile
Always-on
Interactive
Local
Architecture
Cloud
Scalable
Elastic
Virtual
Distributed
Flexible
New applications require
distributed architecture
Distributed architecture
encourages new
development
approaches
New development approaches
demand new architecture
Distributed architecture
enables new applications
New app
requirements
demand new
development
approaches
New dev
approaches
enable new
lightweight
apps
13. 13
The Database Challenge
– The traditional relational database has been stretched beyond its
normal capacity limits by the needs of high-volume, highly
distributed or highly complex applications.
– There are workarounds – such as DIY sharding – but manual,
homegrown efforts can result in database administrators being
stretched beyond their available capacity in terms of managing
complexity.
– Scalability
– Performance
– Relaxed consistency Increased willingness to look
– Agility for emerging alternatives
– Intricacy
– Necessity
14. 14
Scalability, and Other Challenges
• As usage of MySQL and MariaDB has grown, so has the usage
of applications that depend on MySQL and MariaDB:
– Games; Social; Customer Facing; Web; Business apps like Ad Networks;
• This has highlighted a number of challenges
– Scalability of master-slave architecture
– Performance and predictability at scale
– Lower latency; greater throughput; richer apps
– User expectations rising
– Manageability of increasing database/app sprawl
• External factors driving greater complexity:
– Distributed computing architectures
– Proliferation of cloud and elasticity requirements
– Geo-distributed application requirements
– Viral success means growth can come very quickly
15. 15
Conclusions
• The success of MySQL and MariaDB has led to complications
in terms of scalability concerns
• Distributed computing, proliferation of cloud, and geo-
distributed applications are adding to the complexity
• Manual sharding techniques transfer the strain from the
database to the database administrator
• MySQL – and MySQL administrators – has/have never been
under so much strain
• Database scalability software enables users to move beyond
the limitations and complexity of DIY sharding; precisely how
data is managed with a distributed database in the cloud or on
premise is key.
18. 18
Quick Scale Out
Medium scale needs
Multiple database
replicas performing load
balancing with
read/write splitting
Designs of Distributed MySQL Environments
Massive Scale Out
High scale needs
Complete distributed
database environment,
with policy-based data
sharding/distribution
21. 21
The Right Solution for You Depends on Your Goals
• Scale (mostly) reads
• Scale (mostly) writes
• Performance of reads
– Affected by joins and big tables scans of big tables
• Performance of writes
– Affected by IO r/wr, CPU and table indexes
(a growing overhead)
• Locks
• CPU/IO/ RAM issues
• Load peaks
• Data growth
• Geo-distribution, special data distribution needs
22. Pros and Cons of
Abstracting Complex Database Topology
23. 23
Pros of Abstracting Complex Database Topology
• Development Agility - Accelerates
your innovation speed
• Simplifies application code
• Reduces maintenance costs and
simplifies it
• Operations Efficiency – Zero
downtime for applications
• Reduces operation costs
• Better monitoring, analytics, HA,
scale, elasticity, etc.
24. 24
Cons of Abstracting Complex Database Topology
• Additional technology component may increase complexity
• Additional layer to monitor and manage
• Additional machines to monitor and manage (possible increased opex)
• Less control on application code level (transparent)
27. 27
Characteristics of Distributed Table Types
• MASTER – On master shard (0) only
Site settings, Admin data tables
• GLOBAL – Full copy on all shards
Lookups, Frequently joined tables, Slow growing tables
• DISTRIBUTED-ROOT – Distribution based on a key column
User.Id
• DISTRIBUTED-CASCADED (child) – Based on parent row
User_Photos, User_Photos_Likes – depend on Users
Shards: 0 1 2 3
Full table
Full table Full table Full table Full table
¼ table ¼ table ¼ table ¼ table
28. 28
Characteristics of Distributed Queries
• ONE-DB – 1 shard, 1 node. Most optimal.
1) Any call when data known to be in one shard (Distributed/Master)
2) Call to Global table (load balance)
• ALL-DB – All shards, 1 node.
1) AGREGATED READs (like map-reduce)
2) DML (writes) on Global tables
3) DDL (create, drop, alter schema)
• FULL-DB – All shards, all nodes.
Session calls (USE, SET)
• CROSS-DB – #n shards, 1 node. Least optimal, but critical
Cross-shard conflict resolution.
Note: Not all sharding platforms support all distributed query types.
29. 29
Why Data Modeling is Important?
• DATA and LOAD – Efficient distribution of:
– DATA - all / main tables and data
– READS
– WRITES
• QUERIES
– Handle ALL-DB Queries (Map-reduce concept)
– Minimize (but support!) CROSS-DB Queries – higher performance and scale
• OPTIMIZE DEVELOPMENT with SQL ANALYTICS
– Insight into the real database usage
30. 30
Data Relationships Can be Extremely Complex
Usually, scale out is applied to growing-mature apps.
How do you define an optimal data distribution policy?
32. 32
ScaleBase Analysis Genie
• A tool enabling MySQL visual analysis and building an optimal data
distribution policy; Designed for DBAs, Architects & Dev. Managers
• Two step-process:
– Analysis Assistant
– An agent captures app/DB information, including SQL traffic and
database metrics
– Obfuscates, summarizes and packages the App-DB data
– Analysis Genie
– a SaaS application, receives the AA package and presents the
visual analysis and details the policy configuration
Analysis Assistant Analysis Genie
33. 33
ScaleBase Analysis Genie
• Advanced analytics
– Schemas, data & queries
– Semantic structure analysis
– Usage, Load and Scale analytics
• Data Modeling and
Scale-out planning
– Customized for the most complex
applications
– Auto identification of optimal
data distribution policy
– Complete policy control
• Quality assurance
– Review before production
• Simulation of results
– “What-if” analysis
34. 34
Relationship Identification
Mapping includes:
• Schemas structures
• Tables & columns names
matching
• Queries parsing and
identification of joined
tables and columns
• Statistics on every object
size and access
37. 37
MySQL Visual Analysis Demo
• Visual analysis
• Distribution policy identification and configuration
• Scale out load via data sharding (massive scale out)
ScaleBase Enterprise
Analysis
Genie
39. 39
Reading Plus
Who:
• Online education company
Problem:
• Busy season (back-to-school) was approaching and they needed a solution
that could be quickly implemented, while guaranteeing uptime
• With increasing growth, they needed to implement a scale out solution quickly
Alternatives Considered:
• A clustering technology, which proved to be infeasible due to schema
complexity and a lengthy re-architecture requirement
Solution:
• Used visual analysis to determine best scale out plan
• ScaleBase Lite for instant scale out and continuous availability
• 35 Tomcat application servers were connected to 3 ScaleBase controllers
• ScaleBase performed automated read/write splitting and load balancing
40. 40
Next Gen SaaS ERP Company
Who:
• Inventory management
ecommerce company
• Hosted on Rackspace
(ScaleBase Partner)
Problem:
• Largest available hardware could not support workload
Alternatives Considered:
• Initially went with a “black box” solution, encountering many issues
Solution:
• Scaled out a single MySQL instance to 8 clustered shards
• On-demand growth – current workload over 20,000 TPS
– Plan to double footprint in next quarter
– Support all production customers during Black Friday & Cyber Monday
41. 41
Scale out to unlimited users
Continuous availability
Dynamic workload optimization
Fast and simple deployment
Easily scale out a single
MySQL instance
Optimized for the Cloud
Reduces time-to-market
No changes needed to app or database
Database usage analytics
Intelligent load balancing
Centralized data management
ScaleBase
Distributed Database Management System
42. 42
Products and Editions
Community
Limited by
Deployment
Startup
Free for Qualified
Candidates
Enterprise
Massive
Scale Out
Also available on:
Lite
Quick
Scale Out
Analysis Genie Database Performance Analytics
43. 43
How Can I Learn More?
Use visual analysis to plan your
scale out strategy
Download the
Analysis Genie:
https://www.scalebase.com/software
Read the 451 report about
ScaleBase (& the DB market)
Download Jason’s Report
(authored last week)
https://www.scalebase.com/resources/
whitepapers
Here is a summary of different approaches. More detailed description can be found on our website, under Resources -> Competitive Comparison
Explain the circles,
We are the only one for example that provide Advanced Analytics, which is the foundation for defining optimal distribution policy.
ScaleBase solution is the most simple to deploy, enabling shortest go-to-market and lowest maintenance
One of first steps is to Visually Analyze complete summary about state of your MySQL tables:
- Physical and Logical Sizes, Writes, Reads, Joins
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Determine optimal distribution policy for your specific application and database
Analyze your existing schema and queries
What is the current structure of your data
How is your data accessed by the applications
What is the size and rate of writes to individual tables
Risk
Cost savings (ROI)
Time to market
Building solution takes years
Open source is limited
Not comprehensive
Lack of technical support and services
Custom built
Inefficient and hard to maintain
Risk
Cost savings (ROI)
Time to market
Building solution takes years
Open source is limited
Not comprehensive
Lack of technical support and services
Custom built
Inefficient and hard to maintain