SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
1. Introduction: Why Scale?
2. Vertical & Horizontal
Partitioning
3. Partitioned Tables
4. Distributed Partitioned Views
5. Database Sharding
6. Stretch Databases (optional)
Ralph: Who am I?
• An Enterprise Architect
• at iGamingCloud, Gaming Innovation Group
• focus on Data Platforms
• A Microsoft Certified Trainer
• deliver MTA, MCSA, MCSE locally
• covering Windows, SQL Server, C#
• I’m here to describe the need for database scalability, describe a
number of possible cross platform solutions, and demonstrate
technologies available in MS SQL Server 2016 and Azure.
1. Introduction
Why do we need to scale databases?
Overview of possible options
Scaling Databases: Why?
• Most application environments are developed as a monolith, a single application running a single
database on a single server.
• In time, the whole application environment starts slowing down:
• increased data volumes
• increased work loads
• The simplest option is to introduce an app/web farm to balance the application across multiple
servers whilst using the same old single database.
• But this might not be enough… we need to scale the database!
Scaling Databases: Optimisations
• Unless the whole application environment is redesigned and redeveloped, one needs to look into
optimising the database layer.
• Large database problems include:
• Queries become slower, possibly giving time-outs under load
• Backups are slower to take, to ship, and to restore
• Performing index maintenance impacts even more
• Common optimisations include:
• Vertical Scaling: scale up current servers to max disk/memory/cpu, or simply migrate to a bigger server
• Read Scaling: scale out to introduce an (a)sync server to split read-only queries from the application
• Database restructuring: improved table designs, introduction of aggregation tables
• Offload data: move old transactional data to archive servers, deletion of log data
• But this might not be enough… we need to partition the database!
Scaling Databases: Data Partitioning
• Even though we can scale vertically by adding more resources,
a single database would need to be scaled within itself:
• Vertical & Horizontal Partitioning
• Partitioned Tables
• When a single database is too big, horizontal scaling is done
using distributed databases:
• Distributed Partitioned Views
• Database Sharding
Scale
Up
Scale
Out
Scaling Databases: Domain Partitioning
• A different approach is to partition your data by domain.
• This is achieved by splitting data by domain and moving them into their own database.
• This could be fairly easy if tables are already grouped into their own schema by domain.
• However it could be problematic if application queries and reports span multiple schemas
• reports would now need to mesh multiple databases together
• or read from a consolidated data warehouse
• Even though this breaks the database down into smaller databases, each smaller database has the
potential to become a problem on its own.
• Refactoring a monolith application into various microservices adopts this principle with each
microservice having its own data store.
• Microservices are usually polyglot persistent. The appropriate data store is chosen according to the
required features and partition usage: e.g. using a mix of SQL & NoSQL datastores.
2. Partitioning
Benefits
Strategies: Horizontal & Vertical Partitioning
Updatable Views
DEMO
Partitioning Benefits
• Scalability: Scale-up will eventually reach a physical hardware limit.
• Performance: Data access takes place on smaller partitions, in parallel for multiple partitions.
• Availability: Reduce single point of failures; multiple disk drives, multiple databases, multiple servers.
• Security: Separate sensitive and non-sensitive data into different partitions.
• Flexibility: Varied operational management strategies by partition; monitoring, backups, restores,
indexing, etc.
https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
Strategy: Vertical Partitioning
ProductID Name Price DateCreated Stock LastOrderded
AR-5381 Adjustable Race 50 11-Jan-2016 8 17-Nov-2016
AA-8327 Bearing Ball 100 11-Feb-2016 46 21-Nov-2017
BE-2349 BB Ball Bearing 105 11-Mar-2016 52 16-Sep-2017
CE-2908
Headset Ball
Bearings
90 11-Jan-2017 13 12-Feb-2017
CL-2036 Blade 70 11-Feb-2017 28 01-Dec-2017
DA-5965 LL Crankarm 150 11-Mar-2017 30 08-Dec-2017
ProductID Name Price DateCreated
AR-5381 Adjustable Race 50 11-Jan-2016
AA-8327 Bearing Ball 100 11-Feb-2016
BE-2349 BB Ball Bearing 105 11-Mar-2016
CE-2908 Headset Ball Bearings 90 11-Jan-2017
CL-2036 Blade 70 11-Feb-2017
DA-5965 LL Crankarm 150 11-Mar-2017
ProductID Stock LastOrderded
AR-5381 8 17-Nov-2016
AA-8327 46 21-Nov-2017
BE-2349 52 16-Sep-2017
CE-2908 13 12-Feb-2017
CL-2036 28 01-Dec-2017
DA-5965 30 08-Dec-2017
Strategy: Horizontal Partitioning
ProductID Name Price Stock DateCreated LastOrderded
AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016
AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017
BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017
CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017
CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017
DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017
ProductID Name Price Stock DateCreated LastOrderded
CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017
CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017
DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017
ProductID Name Price Stock DateCreated LastOrderded
AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016
AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017
BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017
Production.Products_2016
Production.Products
Production.Products_2017
Horizontal Partitioning: Why?
• The idea behind horizontal partitioning is that to split a large table into multiple smaller tables.
• Query-wise
• One smaller table is faster to query than a larger table
• However querying multiple smaller tables is problematic
• Administration-wise, multiple tables can be placed into different file groups, which
• Can be placed into different physical disks > parallelism can be faster
• Can be backed up individually > smaller backup windows
• Set as read-only > protect older data from modifications, backup once and forget
Horizontal Partitioning: Dynamic Queries
DECLARE @SQL AS NVARCHAR(MAX) = CONCAT('
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_', dbo.GetPartition('Production.Products', @FromDate), ' WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
')
EXECUTE sp_ExecuteSql @Stmt = @SQL
, @Params = N'@FromDate AS DATETIME, @ToDate AS DATETIME‘
, @FromDate = @FromDate
, @ToDate = @ToDate
Horizontal Partitioning: UNIONed Queries
;WITH products AS
(
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_2016 WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
UNION ALL
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM Production.Products_2017 WITH(NOLOCK)
WHERE DateCreated >= @FromDate AND Date <= @ToDate
UNION ALL
...
)
SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered
FROM products
Views
• Dynamic Queries are a pain! No syntax checking, string concatenation, etc…
• Constantly creating CTEs to union tables is heavy for everyone.
• Usually create VIEWs to provide a unified view
• however could be cumbersome and repetitive e.g. every month
• thus we dynamically create them using custom code and jobs
• VIEWS help transparently replace an existing table with multiple smaller ones
• no code changes required
• however not all views are updatable
Updateable Views
• You can modify the data of an underlying base table through a view, as long as the following
conditions are true:
• Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base
table.
• The columns being modified in the view must directly reference the underlying data in the table columns.
• The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.
• TOP is not used anywhere in the select statement of the view together with the WITH CHECK OPTION clause.
• INSTEAD OF triggers can be created on a view to make it updatable. The INSTEAD OF trigger is
executed instead of the data modification statement on which the trigger is defined.
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql
Partitioning
3. Partitioned Tables
Defining Partition Functions & Partition Schemes
Tooling: Custom Partition framework
Myths and performance issues
DEMO
Partitioned Tables: Definition…
• Microsoft introduced Partitioned Tables in MSSQL SERVER 2005
• It supports the use of multiple file groups
• It provides a single table to query from irrespective of partitions
• The above example partitions a table into:
• A partition per month within the current year
• A partition per year for the last two years
• A partition for all the previous years
2015 2016
Jan
2017
E
M
P
T
Y
Feb
2017
E
M
P
T
Y
Pre-2015
Partitioned Tables: Definition…
• A Partition Function
• A Data Type – typically DATE related
• A Range – LEFT or RIGHT
CREATE PARTITION FUNCTION PF_Name (DATETIME2)
AS RANGE RIGHT FOR VALUES ('20170101','20170201','20170301');
• A Partition Scheme – that associates file groups to the partition function
CREATE PARTITION SCHEME PS_Name
AS PARTITION PF_Name
TO (FG000000, FG201701, FG201702, FG201703);
Partitioned Tables: Definition…
• With a RIGHT Range, the previous partitioned table example requires 4 partitions:
• A partition on the left, containing everything from beginning of time till before Jan 2017 – should be empty
• A partition from Jan 2017 till before Feb 2017
• A partition from Feb 2017 till before Mar 2017
• A partition from Mar 2017 till the end of time – should be empty
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
(EMPTY)
Partitioned Tables: Splitting…
• A partitioned table can be extended by splitting an existing partition
• We first add a new file group to the partition scheme
ALTER PARTITION SCHEME PS_Name
NEXT USED [FG201704]
• We then split the partition function to the right
ALTER PARTITION FUNCTION PF_Name()
SPLIT RANGE ('20170401')
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
(EMPTY)
Jan
2017
Feb
2017
E
M
P
T
Y
Apr
2017
(EMPTY)
Mar
2017
Summary: Required steps
• On setup:
1. Create file group for non-partitioned indexes (if required)
2. Create file group for left hand side (to remain empty)
3. Create Partition Function (partitioning key datatype, range direction)
4. Create Partition Scheme (with empty file group)
• Regularly (e.g. monthly)
1. Create file group
2. Split partition
Tooling: Custom partitioning framework
• We created a number of stored procedures to handle these steps:
• Maintenance.UspCreateFileGroup – used to create files and file groups
• Maintenance.UspCreatePartition – used once to create the partition function and partition scheme
• Maintenance.UspCreatePartitionView – used to create a monthly view per partition by date range
• Maintenance.UspSplitPartition – used monthly to create a new file group, split partition, create view
• Maintenance.UspSplitPartitionAllTables – used monthly to split all partition tables via agent job
Partitioned Tables
Partitioned Tables: Merging…
• A partitioned table can have multiple partitions merged into one
ALTER PARTITION FUNCTION PF_Name()
MERGE RANGE('20170201');
• Note: Merging partitions with data movement across file groups will be slow
Jan & Feb
2017
E
M
P
T
Y
Mar
2017
Apr
2017
(EMPTY)
Jan
2017
Feb
2017
E
M
P
T
Y
Mar
2017
Apr
2017
(EMPTY)
Partitioned Tables: Switching…
• Partition switching reduces locks whilst:
• Loading data into a warehouse
• Deleting old data during archival
• Move data between tiered storage
• Partitions need to be in the same file group
• Re-create the staging indexes to move physical data
ALTER TABLE schema.StgTable
SWITCH PARTITION $PARTITION.PF_Name('20170201')
TO schema.PrdTable PARTITION $PARTITION.PF_Name('20170201')
Jan 2017 Feb 2017 Mar 2017
Apr 2017
(EMPTY)
Empty
Partition Function
Production table
Staging table
Partitioned Tables
https://support.microsoft.com/en-us/help/2965553/decreased-performance-for-sql-server-when-you-run-a-top--max-or-min-ag
Decreased performance: TOP, MAX or MIN
Decreased performance: TOP, MAX or MIN
• Test results show that TOP is slower on partitioned tables by 10%
• ROWCOUNT can be used instead
Increased performance: SELECT using non-clustered PK
• When using ROWCOUNT, throughput on partitioned tables is faster
• by 22% throughput
• and has a 3% improvement on response time when using the non-clustered primary key
Increased performance: SELECT using Partitioning Key
• When using ROWCOUNT, throughput on partitioned tables is faster
• by 6% throughput
• and has a 7% improvement on response time when using the clustered partitioning date key
Increased performance: Inserts
• Combined INSERT & SELECT tests found partitioned tables to be faster:
• SELECT – 9% Throughput benefit / 11% improvement in response times
• INSERT – 4% Throughput benefit / 9% improvement in response times
https://blogs.msdn.microsoft.com/sqlmeditation/2013/04/02/dealing-with-unique-columns-when-using-table-partitioning
Unique columns
Unique columns
• Traditionally developers create an IDENTITY(1,1) PRIMARY KEY to provide uniqueness
• This cannot be used with partitioned tables
• Should be replaced with a UNIQUEIDENTIFIER generated at application level (also in preparation for distributed
tables…)
• A PRIMARY KEY is by default CLUSTERED and stored with the data
• In partitioned tables, the Partitioning Key has to be CLUSTERED to split the data
• Thus if the PRIMARY KEY does not contain the Partitioning Key this cannot be CLUSTERED
• An un-partitioned NONCLUSTERED PRIMARY KEY can be used to enforce uniqueness
• However this prohibits SWITCHING of partitions due to unaligned indexes
https://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths
Myth: Metadata only operations
Myth: Metadata only operations
• Switching partitions in & out
• Requires schema lock on both source and destination tables
• Usually the command is set with a timeout; and try again later
• Splitting & merging partitions
• Altering the partition function is an offline operation
• Splitting a partition which contains data requires data movement
• If the range split introduces a different file group, data needs to physically move between files
• This is why we keep an empty partition on the left and right, and we always split the empty partition
4. Distributed Partitioned Views
Definition
Requirements… loads!
DEMO
Distributed Partitioned Views: Definition
• Basically a view which unions data from multiple databases hosted on different servers.
• Also referred to as Federated Databases.
• Used when applications are unaware of such partitioning.
• Requires Linked Servers.
• Performance improves with lazy schema validation option.
• Read-only views work everywhere.
• Updatable views require Enterprise Edition.
• INSTEAD OF triggers can be used to make views updatable on Standard Edition.
https://docs.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-2016
Distributed Partitioned Views
Distributed Partitioned Views: Requirements
• Tables Rules
• Member tables cannot be referenced more than one time in the view.
• Member tables cannot have indexes created on any computed columns.
• Member tables must have all PRIMARY KEY constraints on the same number of columns.
• Member tables must have the same ANSI padding setting.
• Column Rules
• All columns in each member table must be included in the same ordinal position in the select list.
• Columns cannot be referenced more than one time in the select list.
• The columns in the select list of each SELECT statement must be of the same type.
• The key ranges of the CHECK constraints in each table cannot overlap with the ranges of any other table.
• Partitioning Column Rules
• The partitioning column cannot be an identity, default, timestamp, or computed column.
• The partitioning column must be in the same ordinal location in the select list of each SELECT statement in the view.
• The partitioning column cannot allow for nulls.
• The partitioning column must be a part of the primary key of the table.
• There must be only one constraint on the partitioning column.
• There are no restrictions on the updatability of the partitioning column.
https://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx
Distributed Partitioned Views: Updatable
• INSERT Statements
• All columns must be included in the INSERT statement even if the column can be NULL in the base table or has a DEFAULT constraint defined in
the base table.
• The DEFAULT keyword cannot be specified in the VALUES clause of the INSERT statement.
• INSERT statements must supply a value that satisfies the logic of the CHECK constraint defined on the partitioning column for one of the
member tables.
• INSERT statements are not allowed if a member table contains a column with an identity property.
• INSERT statements are not allowed if a member table contains a timestamp column.
• INSERT statements are not allowed if there is a self-join with the same view or any one of the member tables.
• UPDATE Statements
• UPDATE statements cannot specify the DEFAULT keyword as a value in the SET clause even if the column has a DEFAULT value defined in the
corresponding member table
• The value of a column with an identity property cannot be changed: however, the other columns can be updated.
• The value of a PRIMARY KEY cannot be changed if the column contains text, image, or ntext data.
• Updates are not allowed if a base table contains a timestamp column.
• Updates are not allowed if there is a self-join with the same view or any one of the member tables.
• DELETE Statements
• DELETE statements are not allowed when there is a self-join with the same view or any one of the member tables.
https://technet.microsoft.com/en-us/library/ms187067(v=sql.105).aspx
Distributed Partitioned Views
5. Database Sharding
Definition
Sharding Strategies
Database Sharding: Definition
• A form of horizontal partitioning in which partitions are distributed on commodity servers.
• An individual partition is referred to as a shard.
• The application is shard-aware and can route connection requests autonomously without the
need of distributed partitioned views.
• Sharding is used to truly circumvent issues of having a single monolith database or a single entry-
point in terms of Storage space, Computing resources, Network bandwidth, and Geography.
https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
Database Sharding: Problems
• Queries that JOIN shards together are problematic and would need to be meshed together via the
application.
• Multiple shards can be queried in parallel and merged together either in memory or client-side.
• Referential integrity might be non existent.
• Shards are usually used with domain-based partitioning and thus referenced tables could be in different databases.
• Un-partitioned reference tables would also be placed outside the shards.
• However, static reference tables could be treated as global tables, thus copied and replicated into all shards.
• Rebalancing sharded data is problematic. This might be required when
• a shard key changes and thus data need to move between shards
• a new shard is added and data needs to be redistributed
https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
Database Sharding: Strategies
• The Lookup strategy
• A map is used to route a request for data to the shard that contains such data using the shard key.
• Multi-tenant applications can store all the data for a tenant together in a shard using the tenant ID as
shard key.
• Multiple tenants can share the same shard, but the data for a single tenant cannot spread across
multiple shards.
• The Range strategy
• Sequential shard keys are ordered and grouped together.
• Useful for applications that frequently retrieve sets of items using range queries.
• The Hash strategy
• This is used to reduce the chance of hotspots (shards that receive a disproportionate amount of load).
• The chosen hashing function should distribute data evenly across the shards, possibly by introducing
some random element into the computation.
6. Stretch Database
Definition
Demo
Stretch Database: Definition
• Stretch Database is a feature of SQL Server 2016.
• This is used to move cold data from on-premise instances
directly into the cloud with only a few clicks.
• Eliminates the need to manually create archiving procedures
that move data out of production db and into archive db.
• Requires an Azure subscription.
• Download “Data Migration Assistant” to identify candidate tables to stretch.
https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/stretch-database
Stretch Database: Limitations
• Limitations for Stretch-enabled tables
• Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the
migrated data.
• You can't UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration.
• You can't INSERT rows into a Stretch-enabled table on a linked server.
• You can't create an index for a view that includes Stretch-enabled tables.
• Filters on SQL Server indexes are not propagated to the remote table.
• Limitations that currently prevent you from enabling Stretch for a table
• Tables that have more than 1,023 columns or more than 998 indexes
• FileTables or tables that contain FILESTREAM data
• Tables that are replicated, or that are actively using Change Tracking or Change Data Capture
• Memory-optimized tables
• Data types: text, ntext, image, timestamp, sql_variant, XML, and CLR data types including geometry, geography, hierarchyid
• Computed columns
• Default constraints and check constraints
• Foreign key constraints that reference the table.
• Full text indexes, XML indexes, Spatial indexes, Indexed views
https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/limitations-for-stretch-database
Stretch Database
• Today’s event was sponsored by:
Microsoft Malta : location and refreshments
Gaming Innovation Group : Parking vouchers
• The Tech-Spark community requires your help. Sponsor an event by providing a meeting place,
refreshments, and why not, deliver a session! Feel free to contact us should you want to help.
Contact Us
Ralph Attard
raland@raland.net
Tech Spark
http://www.tech-spark.com
https://www.facebook.com/techsparkmalta

Mais conteúdo relacionado

Mais procurados

ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012
Steve Wake
 
Exploring Scalability, Performance And Deployment
Exploring Scalability, Performance And DeploymentExploring Scalability, Performance And Deployment
Exploring Scalability, Performance And Deployment
rsnarayanan
 

Mais procurados (18)

SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite Things
 
SQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and EnhancementsSQL Server 2016 New Features and Enhancements
SQL Server 2016 New Features and Enhancements
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
Azure SQL
Azure SQLAzure SQL
Azure SQL
 
ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012ABCs of CDC with SSIS 2012
ABCs of CDC with SSIS 2012
 
SQL server 2016 New Features
SQL server 2016 New FeaturesSQL server 2016 New Features
SQL server 2016 New Features
 
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
 
Exploring Scalability, Performance And Deployment
Exploring Scalability, Performance And DeploymentExploring Scalability, Performance And Deployment
Exploring Scalability, Performance And Deployment
 
A to z for sql azure databases
A to z for sql azure databasesA to z for sql azure databases
A to z for sql azure databases
 
How to install Vertica in a single node.
How to install Vertica in a single node.How to install Vertica in a single node.
How to install Vertica in a single node.
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 

Semelhante a Tech-Spark: Scaling Databases

Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum
 

Semelhante a Tech-Spark: Scaling Databases (20)

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadSQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdf
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
Unity Connect - Getting SQL Spinning with SharePoint - Best Practices for the...
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure tools
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Tech-Spark: Scaling Databases

  • 1. 1. Introduction: Why Scale? 2. Vertical & Horizontal Partitioning 3. Partitioned Tables 4. Distributed Partitioned Views 5. Database Sharding 6. Stretch Databases (optional)
  • 2. Ralph: Who am I? • An Enterprise Architect • at iGamingCloud, Gaming Innovation Group • focus on Data Platforms • A Microsoft Certified Trainer • deliver MTA, MCSA, MCSE locally • covering Windows, SQL Server, C# • I’m here to describe the need for database scalability, describe a number of possible cross platform solutions, and demonstrate technologies available in MS SQL Server 2016 and Azure.
  • 3. 1. Introduction Why do we need to scale databases? Overview of possible options
  • 4. Scaling Databases: Why? • Most application environments are developed as a monolith, a single application running a single database on a single server. • In time, the whole application environment starts slowing down: • increased data volumes • increased work loads • The simplest option is to introduce an app/web farm to balance the application across multiple servers whilst using the same old single database. • But this might not be enough… we need to scale the database!
  • 5. Scaling Databases: Optimisations • Unless the whole application environment is redesigned and redeveloped, one needs to look into optimising the database layer. • Large database problems include: • Queries become slower, possibly giving time-outs under load • Backups are slower to take, to ship, and to restore • Performing index maintenance impacts even more • Common optimisations include: • Vertical Scaling: scale up current servers to max disk/memory/cpu, or simply migrate to a bigger server • Read Scaling: scale out to introduce an (a)sync server to split read-only queries from the application • Database restructuring: improved table designs, introduction of aggregation tables • Offload data: move old transactional data to archive servers, deletion of log data • But this might not be enough… we need to partition the database!
  • 6. Scaling Databases: Data Partitioning • Even though we can scale vertically by adding more resources, a single database would need to be scaled within itself: • Vertical & Horizontal Partitioning • Partitioned Tables • When a single database is too big, horizontal scaling is done using distributed databases: • Distributed Partitioned Views • Database Sharding Scale Up Scale Out
  • 7. Scaling Databases: Domain Partitioning • A different approach is to partition your data by domain. • This is achieved by splitting data by domain and moving them into their own database. • This could be fairly easy if tables are already grouped into their own schema by domain. • However it could be problematic if application queries and reports span multiple schemas • reports would now need to mesh multiple databases together • or read from a consolidated data warehouse • Even though this breaks the database down into smaller databases, each smaller database has the potential to become a problem on its own. • Refactoring a monolith application into various microservices adopts this principle with each microservice having its own data store. • Microservices are usually polyglot persistent. The appropriate data store is chosen according to the required features and partition usage: e.g. using a mix of SQL & NoSQL datastores.
  • 8. 2. Partitioning Benefits Strategies: Horizontal & Vertical Partitioning Updatable Views DEMO
  • 9. Partitioning Benefits • Scalability: Scale-up will eventually reach a physical hardware limit. • Performance: Data access takes place on smaller partitions, in parallel for multiple partitions. • Availability: Reduce single point of failures; multiple disk drives, multiple databases, multiple servers. • Security: Separate sensitive and non-sensitive data into different partitions. • Flexibility: Varied operational management strategies by partition; monitoring, backups, restores, indexing, etc. https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
  • 10. Strategy: Vertical Partitioning ProductID Name Price DateCreated Stock LastOrderded AR-5381 Adjustable Race 50 11-Jan-2016 8 17-Nov-2016 AA-8327 Bearing Ball 100 11-Feb-2016 46 21-Nov-2017 BE-2349 BB Ball Bearing 105 11-Mar-2016 52 16-Sep-2017 CE-2908 Headset Ball Bearings 90 11-Jan-2017 13 12-Feb-2017 CL-2036 Blade 70 11-Feb-2017 28 01-Dec-2017 DA-5965 LL Crankarm 150 11-Mar-2017 30 08-Dec-2017 ProductID Name Price DateCreated AR-5381 Adjustable Race 50 11-Jan-2016 AA-8327 Bearing Ball 100 11-Feb-2016 BE-2349 BB Ball Bearing 105 11-Mar-2016 CE-2908 Headset Ball Bearings 90 11-Jan-2017 CL-2036 Blade 70 11-Feb-2017 DA-5965 LL Crankarm 150 11-Mar-2017 ProductID Stock LastOrderded AR-5381 8 17-Nov-2016 AA-8327 46 21-Nov-2017 BE-2349 52 16-Sep-2017 CE-2908 13 12-Feb-2017 CL-2036 28 01-Dec-2017 DA-5965 30 08-Dec-2017
  • 11. Strategy: Horizontal Partitioning ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 Production.Products_2016 Production.Products Production.Products_2017
  • 12. Horizontal Partitioning: Why? • The idea behind horizontal partitioning is that to split a large table into multiple smaller tables. • Query-wise • One smaller table is faster to query than a larger table • However querying multiple smaller tables is problematic • Administration-wise, multiple tables can be placed into different file groups, which • Can be placed into different physical disks > parallelism can be faster • Can be backed up individually > smaller backup windows • Set as read-only > protect older data from modifications, backup once and forget
  • 13. Horizontal Partitioning: Dynamic Queries DECLARE @SQL AS NVARCHAR(MAX) = CONCAT(' SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_', dbo.GetPartition('Production.Products', @FromDate), ' WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate ') EXECUTE sp_ExecuteSql @Stmt = @SQL , @Params = N'@FromDate AS DATETIME, @ToDate AS DATETIME‘ , @FromDate = @FromDate , @ToDate = @ToDate
  • 14. Horizontal Partitioning: UNIONed Queries ;WITH products AS ( SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2016 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2017 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL ... ) SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM products
  • 15. Views • Dynamic Queries are a pain! No syntax checking, string concatenation, etc… • Constantly creating CTEs to union tables is heavy for everyone. • Usually create VIEWs to provide a unified view • however could be cumbersome and repetitive e.g. every month • thus we dynamically create them using custom code and jobs • VIEWS help transparently replace an existing table with multiple smaller ones • no code changes required • however not all views are updatable
  • 16. Updateable Views • You can modify the data of an underlying base table through a view, as long as the following conditions are true: • Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table. • The columns being modified in the view must directly reference the underlying data in the table columns. • The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses. • TOP is not used anywhere in the select statement of the view together with the WITH CHECK OPTION clause. • INSTEAD OF triggers can be created on a view to make it updatable. The INSTEAD OF trigger is executed instead of the data modification statement on which the trigger is defined. https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql
  • 18. 3. Partitioned Tables Defining Partition Functions & Partition Schemes Tooling: Custom Partition framework Myths and performance issues DEMO
  • 19. Partitioned Tables: Definition… • Microsoft introduced Partitioned Tables in MSSQL SERVER 2005 • It supports the use of multiple file groups • It provides a single table to query from irrespective of partitions • The above example partitions a table into: • A partition per month within the current year • A partition per year for the last two years • A partition for all the previous years 2015 2016 Jan 2017 E M P T Y Feb 2017 E M P T Y Pre-2015
  • 20. Partitioned Tables: Definition… • A Partition Function • A Data Type – typically DATE related • A Range – LEFT or RIGHT CREATE PARTITION FUNCTION PF_Name (DATETIME2) AS RANGE RIGHT FOR VALUES ('20170101','20170201','20170301'); • A Partition Scheme – that associates file groups to the partition function CREATE PARTITION SCHEME PS_Name AS PARTITION PF_Name TO (FG000000, FG201701, FG201702, FG201703);
  • 21. Partitioned Tables: Definition… • With a RIGHT Range, the previous partitioned table example requires 4 partitions: • A partition on the left, containing everything from beginning of time till before Jan 2017 – should be empty • A partition from Jan 2017 till before Feb 2017 • A partition from Feb 2017 till before Mar 2017 • A partition from Mar 2017 till the end of time – should be empty Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY)
  • 22. Partitioned Tables: Splitting… • A partitioned table can be extended by splitting an existing partition • We first add a new file group to the partition scheme ALTER PARTITION SCHEME PS_Name NEXT USED [FG201704] • We then split the partition function to the right ALTER PARTITION FUNCTION PF_Name() SPLIT RANGE ('20170401') Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Apr 2017 (EMPTY) Mar 2017
  • 23. Summary: Required steps • On setup: 1. Create file group for non-partitioned indexes (if required) 2. Create file group for left hand side (to remain empty) 3. Create Partition Function (partitioning key datatype, range direction) 4. Create Partition Scheme (with empty file group) • Regularly (e.g. monthly) 1. Create file group 2. Split partition
  • 24. Tooling: Custom partitioning framework • We created a number of stored procedures to handle these steps: • Maintenance.UspCreateFileGroup – used to create files and file groups • Maintenance.UspCreatePartition – used once to create the partition function and partition scheme • Maintenance.UspCreatePartitionView – used to create a monthly view per partition by date range • Maintenance.UspSplitPartition – used monthly to create a new file group, split partition, create view • Maintenance.UspSplitPartitionAllTables – used monthly to split all partition tables via agent job
  • 26. Partitioned Tables: Merging… • A partitioned table can have multiple partitions merged into one ALTER PARTITION FUNCTION PF_Name() MERGE RANGE('20170201'); • Note: Merging partitions with data movement across file groups will be slow Jan & Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY)
  • 27. Partitioned Tables: Switching… • Partition switching reduces locks whilst: • Loading data into a warehouse • Deleting old data during archival • Move data between tiered storage • Partitions need to be in the same file group • Re-create the staging indexes to move physical data ALTER TABLE schema.StgTable SWITCH PARTITION $PARTITION.PF_Name('20170201') TO schema.PrdTable PARTITION $PARTITION.PF_Name('20170201') Jan 2017 Feb 2017 Mar 2017 Apr 2017 (EMPTY) Empty Partition Function Production table Staging table
  • 30. Decreased performance: TOP, MAX or MIN • Test results show that TOP is slower on partitioned tables by 10% • ROWCOUNT can be used instead
  • 31. Increased performance: SELECT using non-clustered PK • When using ROWCOUNT, throughput on partitioned tables is faster • by 22% throughput • and has a 3% improvement on response time when using the non-clustered primary key
  • 32. Increased performance: SELECT using Partitioning Key • When using ROWCOUNT, throughput on partitioned tables is faster • by 6% throughput • and has a 7% improvement on response time when using the clustered partitioning date key
  • 33. Increased performance: Inserts • Combined INSERT & SELECT tests found partitioned tables to be faster: • SELECT – 9% Throughput benefit / 11% improvement in response times • INSERT – 4% Throughput benefit / 9% improvement in response times
  • 35. Unique columns • Traditionally developers create an IDENTITY(1,1) PRIMARY KEY to provide uniqueness • This cannot be used with partitioned tables • Should be replaced with a UNIQUEIDENTIFIER generated at application level (also in preparation for distributed tables…) • A PRIMARY KEY is by default CLUSTERED and stored with the data • In partitioned tables, the Partitioning Key has to be CLUSTERED to split the data • Thus if the PRIMARY KEY does not contain the Partitioning Key this cannot be CLUSTERED • An un-partitioned NONCLUSTERED PRIMARY KEY can be used to enforce uniqueness • However this prohibits SWITCHING of partitions due to unaligned indexes
  • 37. Myth: Metadata only operations • Switching partitions in & out • Requires schema lock on both source and destination tables • Usually the command is set with a timeout; and try again later • Splitting & merging partitions • Altering the partition function is an offline operation • Splitting a partition which contains data requires data movement • If the range split introduces a different file group, data needs to physically move between files • This is why we keep an empty partition on the left and right, and we always split the empty partition
  • 38. 4. Distributed Partitioned Views Definition Requirements… loads! DEMO
  • 39. Distributed Partitioned Views: Definition • Basically a view which unions data from multiple databases hosted on different servers. • Also referred to as Federated Databases. • Used when applications are unaware of such partitioning. • Requires Linked Servers. • Performance improves with lazy schema validation option. • Read-only views work everywhere. • Updatable views require Enterprise Edition. • INSTEAD OF triggers can be used to make views updatable on Standard Edition.
  • 41. Distributed Partitioned Views: Requirements • Tables Rules • Member tables cannot be referenced more than one time in the view. • Member tables cannot have indexes created on any computed columns. • Member tables must have all PRIMARY KEY constraints on the same number of columns. • Member tables must have the same ANSI padding setting. • Column Rules • All columns in each member table must be included in the same ordinal position in the select list. • Columns cannot be referenced more than one time in the select list. • The columns in the select list of each SELECT statement must be of the same type. • The key ranges of the CHECK constraints in each table cannot overlap with the ranges of any other table. • Partitioning Column Rules • The partitioning column cannot be an identity, default, timestamp, or computed column. • The partitioning column must be in the same ordinal location in the select list of each SELECT statement in the view. • The partitioning column cannot allow for nulls. • The partitioning column must be a part of the primary key of the table. • There must be only one constraint on the partitioning column. • There are no restrictions on the updatability of the partitioning column. https://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx
  • 42. Distributed Partitioned Views: Updatable • INSERT Statements • All columns must be included in the INSERT statement even if the column can be NULL in the base table or has a DEFAULT constraint defined in the base table. • The DEFAULT keyword cannot be specified in the VALUES clause of the INSERT statement. • INSERT statements must supply a value that satisfies the logic of the CHECK constraint defined on the partitioning column for one of the member tables. • INSERT statements are not allowed if a member table contains a column with an identity property. • INSERT statements are not allowed if a member table contains a timestamp column. • INSERT statements are not allowed if there is a self-join with the same view or any one of the member tables. • UPDATE Statements • UPDATE statements cannot specify the DEFAULT keyword as a value in the SET clause even if the column has a DEFAULT value defined in the corresponding member table • The value of a column with an identity property cannot be changed: however, the other columns can be updated. • The value of a PRIMARY KEY cannot be changed if the column contains text, image, or ntext data. • Updates are not allowed if a base table contains a timestamp column. • Updates are not allowed if there is a self-join with the same view or any one of the member tables. • DELETE Statements • DELETE statements are not allowed when there is a self-join with the same view or any one of the member tables. https://technet.microsoft.com/en-us/library/ms187067(v=sql.105).aspx
  • 45. Database Sharding: Definition • A form of horizontal partitioning in which partitions are distributed on commodity servers. • An individual partition is referred to as a shard. • The application is shard-aware and can route connection requests autonomously without the need of distributed partitioned views. • Sharding is used to truly circumvent issues of having a single monolith database or a single entry- point in terms of Storage space, Computing resources, Network bandwidth, and Geography. https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
  • 46. Database Sharding: Problems • Queries that JOIN shards together are problematic and would need to be meshed together via the application. • Multiple shards can be queried in parallel and merged together either in memory or client-side. • Referential integrity might be non existent. • Shards are usually used with domain-based partitioning and thus referenced tables could be in different databases. • Un-partitioned reference tables would also be placed outside the shards. • However, static reference tables could be treated as global tables, thus copied and replicated into all shards. • Rebalancing sharded data is problematic. This might be required when • a shard key changes and thus data need to move between shards • a new shard is added and data needs to be redistributed https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding
  • 47. Database Sharding: Strategies • The Lookup strategy • A map is used to route a request for data to the shard that contains such data using the shard key. • Multi-tenant applications can store all the data for a tenant together in a shard using the tenant ID as shard key. • Multiple tenants can share the same shard, but the data for a single tenant cannot spread across multiple shards. • The Range strategy • Sequential shard keys are ordered and grouped together. • Useful for applications that frequently retrieve sets of items using range queries. • The Hash strategy • This is used to reduce the chance of hotspots (shards that receive a disproportionate amount of load). • The chosen hashing function should distribute data evenly across the shards, possibly by introducing some random element into the computation.
  • 49. Stretch Database: Definition • Stretch Database is a feature of SQL Server 2016. • This is used to move cold data from on-premise instances directly into the cloud with only a few clicks. • Eliminates the need to manually create archiving procedures that move data out of production db and into archive db. • Requires an Azure subscription. • Download “Data Migration Assistant” to identify candidate tables to stretch. https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/stretch-database
  • 50. Stretch Database: Limitations • Limitations for Stretch-enabled tables • Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the migrated data. • You can't UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration. • You can't INSERT rows into a Stretch-enabled table on a linked server. • You can't create an index for a view that includes Stretch-enabled tables. • Filters on SQL Server indexes are not propagated to the remote table. • Limitations that currently prevent you from enabling Stretch for a table • Tables that have more than 1,023 columns or more than 998 indexes • FileTables or tables that contain FILESTREAM data • Tables that are replicated, or that are actively using Change Tracking or Change Data Capture • Memory-optimized tables • Data types: text, ntext, image, timestamp, sql_variant, XML, and CLR data types including geometry, geography, hierarchyid • Computed columns • Default constraints and check constraints • Foreign key constraints that reference the table. • Full text indexes, XML indexes, Spatial indexes, Indexed views https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/limitations-for-stretch-database
  • 52. • Today’s event was sponsored by: Microsoft Malta : location and refreshments Gaming Innovation Group : Parking vouchers • The Tech-Spark community requires your help. Sponsor an event by providing a meeting place, refreshments, and why not, deliver a session! Feel free to contact us should you want to help.
  • 53. Contact Us Ralph Attard raland@raland.net Tech Spark http://www.tech-spark.com https://www.facebook.com/techsparkmalta