Key features and business value of DB2 10.
Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Les King, Director, Distributed Data Server Product Management, IBM
2. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 Overview - Agenda
Reducing Storage Costs
A hybrid database with pureXML
Industry unique clustering for storage optimization, performance & scalability
Autonomics, Workload Management and Advanced Tooling – let DB2 do the work
High Availability and Extreme Scalability for all workloads
Protecting your existing application investment
The form-factor of your choice
Security, Data Governance & Regulatory Compliance
Going forward
Data Server Innovation to Deliver True Business Value
3. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 – A continuous delivery of content !!
DB2 10.1
Just GA’d in 2Q 2012
DB2 9.8
Introduction of DB2 pureScale
DB2 9.7
About 2/3 of our customers are here
DB2 9.5
About 1/3 of our customers are here
DB2 9.1
Coming up to end of support
Data Server Innovation to Deliver True Business Value
4. IBM Client Confidential – Do Not Distribute
IBM Information Management
Compression
Reducing Overall Storage Costs
Data Server Innovation to Deliver True Business Value
5. IBM Client Confidential – Do Not Distribute
IBM Information Management
Breakthrough Savings with Adaptive Compression
Lower Storage Costs; Lower Administration Costs
Galileo
Adaptive
Compression
DB2 9.7
Temp Space &
Index
DB2 9.5 Compression
On-line
Enablement
DB2 9.1 of
Table Compression
Compression
• Adaptively apply both table-level compression and page-level compression
• Table re-orgs not required to maintain high compression
• Compress archive logs
5 Data Server Innovation to Deliver True Business Value
6. IBM Client Confidential – Do Not Distribute
IBM Information Management
Row Compression
Reduces the cost of data storage
Fred, Dept 500, 10000, Plano, TX, 24355…
John, Dept 500, 20000, Plano, TX, 24355, Site 3
Fred, (01), 10000, (02),
John, (01), 20000, (02)
179.9 GB
01 Dept 500
76%
02 Plano, TX, 24355 Smaller!
… …
42.5 GB
Dictionary contains repeated information from the rows.
Data Server Innovation to Deliver True Business Value
7. IBM Client Confidential – Do Not Distribute
IBM Information Management
…. And if that’s not enough
Performance gains of up to 40% ….
– All data is compressed in memory
Reduced outages for utilities
– Backup, Reorg now run 2X-4X faster
6X+ volume of data as compared to production size
– Dev/Test, HA, DR, Backup repositories
Data Server Innovation to Deliver True Business Value
8. IBM Client Confidential – Do Not Distribute
IBM Information Management
Improving the Best Compression in the Industry
– Multiple algorithms for automatic index compression Unique in
the
industry
– Automatic compression for temporary tables Unique in
the
Table Temp Table industry
Order By Order By Temp
– Intelligent compression of large objects and XML
Data Server Innovation to Deliver True Business Value
9. IBM Client Confidential – Do Not Distribute
IBM Information Management
Adaptive Compression – What is it
Technology provides compression rates approaching 7X
Provides significant costs benefits
– Storage reduction
• Acquisition cost, floor space, power and cooling
– I/O reduction
• Reduced response time and improved throughput
– Reduced backup and recovery times
Very simple to configure and use
Next generation compression is adaptive
– Improve compression rates by up to 2X (approaching 15X)
– Maintain compression rates with data skew
Data Server Innovation to Deliver True Business Value
9
10. IBM Client Confidential – Do Not Distribute
IBM Information Management
Almost 40% Improvement over DB2 9.7 Compression
When Using Offline Reorg and Compression
60
54.1
50
Storage Size (GB)
40
30
5.2x
20
8.5x
10
10.5
6.4
0
Uncompressed DB2 9.7 Compression Galileo Adaptive
Compression
10 Data Server Innovation to Deliver True Business Value
11. IBM Client Confidential – Do Not Distribute
IBM Information Management
Up to 60% Improvement over DB2 9.7 Compression
When Using Online Automatic Dictionary Creation (ADC)
60
54.1
50
Storage Size (GB)
40
30
2.5x
20
21.2
6.4x
10
8.4
0
Uncompressed DB2 9.7 Compression Galileo Adaptive
Compression
11 Data Server Innovation to Deliver True Business Value
12. IBM Client Confidential – Do Not Distribute
IBM Information Management
Adaptive Compression Shrinks your Data Storage Needs
Higher performance
– Faster queries for I/O-bound environments
– Faster backups
Lower costs
– Postpone upcoming storage purchases
– Lower ongoing storage needs
– Easier administration with reduced need for table re-orgs
“Page-level dynamic compression is one of the new DB2 features that will
reduce planned outages by 40% and storage savings up to 50%”
—Jessica Tatiana Flores Montiel, DAFROS Multiservicios
“Our migration from Oracle Database to DB2 resulted in a 40% storage savings.
Upgrading to DB2 9.7 and index compression brought our average savings to 57%.
Now adaptive compression brings our average savings to 77%, dramatic savings!”
—Andrew Juarez, Lead SAP Basis / DBA, Coca Cola Bottling Company.
12 Data Server Innovation to Deliver True Business Value
13. IBM Client Confidential – Do Not Distribute
IBM Information Management
More Proof Points ….
Data Server Innovation to Deliver True Business Value
14. IBM Client Confidential – Do Not Distribute
IBM Information Management
pureXML
A Hybrid Database
Data Server Innovation to Deliver True Business Value
15. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 XML - A New Generation Hybrid Data Server
High cost development Streamlined development
Poor performance High performance
Or
Business data in XML form Business data in XML form
managed in relational database managed with DB2 pureXMLTM
Data Server Innovation to Deliver True Business Value
16. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 XML - Benefits
XPath
XPath
XML XML Store
Retrieve
Mapping Retrieval
Client Client
Code Code
XML XML
Shred
Compose
Shredded Content Catalog
DB2 XML
Simplified and streamlined solution
No mapping code to write and maintain
No complex schema to manage and maintain
No proprietary catalog
No XPath parsing and result set composition
Improved performance and flexibility
Lower development and maintenance costs and faster to market
Data Server Innovation to Deliver True Business Value
17. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 XML – A First Class Citizen
Data Definition
create table dept(deptID int, deptdoc xml);
Insert
insert into dept(deptID, deptdoc) values (?,?)
Retrieve
select deptID, deptdoc from dept
Query
select deptID, xmlquery('$d/dept/name' passing deptdoc as “d") from dept
where deptID <> “PR27”;
Data Server Innovation to Deliver True Business Value
18. IBM Client Confidential – Do Not Distribute
IBM Information Management
SQL/XML: Use SQL to produce XML
SELECT
Available Functions:
XMLELEMENT(NAME "Department", XMLELEMENT
XMLATTRIBUTES (e.dept AS "name" ), XMLATTRIBUTES
XMLFOREST
XMLAGG( XMLELEMENT(NAME "emp", e.firstname) ) XMLCONCAT
) AS "dept_list" XMLAGG
XML2CLOB
FROM employee e XMLNAMESPACES
WHERE ….. XMLCAST
GROUP BY e.dept;
Start With Produce
dept_list
<Department name="A00">
firstname lastname dept
<emp>CHRISTINE</emp>
<emp>VINCENZO </emp>
SEAN LEE A00
<emp>SEAN</emp>
MICHAEL JOHNSON B01
</Department>
VINCENZO BARELLI A00
<Department name="B01">
CHRISTINE SMITH A00
<emp>MICHAEL</emp>
</Department>
Data Server Innovation to Deliver True Business Value
19. IBM Client Confidential – Do Not Distribute
IBM Information Management
The FLWOR Expression
FOR: iterates through a sequence, bind variable to items
LET: binds a variable to a sequence
WHERE: eliminates items of the iteration
ORDER: reorders items of the iteration
RETURN: constructs query results
FOR $movie in xmlcolumn(‘movies.doc’)
LET $actors := $movie//actor
WHERE $movie/duration > 90 <movie>
<title>Chicago</title>
ORDER by $movie/@year <actor>Renee Zellweger</actor>
RETURN <movie> <actor>Richard Gere</actor>
<actor>Catherine Zeta-Jones</actor>
{$movie/title, $actors} </movie>
</movie>
Data Server Innovation to Deliver True Business Value
20. IBM Client Confidential – Do Not Distribute
IBM Information Management
Storage Optimization and
Multi Dimensional Clustering
Unique in the Industry
Data Server Innovation to Deliver True Business Value
21. IBM Client Confidential – Do Not Distribute
IBM Information Management
Problem : Optimizing for Multiple Access Keys
Database systems try to store the records of a table in a particular order (e.g. in part
number order) to enable fast/ordered retrieval
– Called ‘Clustering’
– Speeds up queries – but only for a single ‘key’ (aka ‘dimension’)
– Queries involving other dimensions suffer
– Clustering eventually degrades
SELECT * FROM Sales WHERE Region = SW
– Usually do not require a page I/O when reading the next record
(because it’s usually on same page as previous record)
Region – The page I/Os that are required, are sequential (efficient)
… NW SW SW SW …
SELECT * FROM Sales WHERE Year = 2009
– Usually do require a page I/O when reading the next record (because
it’s usually on a different page than the previous record)
… 2009, 2010, 2010, 2010, … – Each of these page I/Os is random (inefficient)
Year
Data Server Innovation to Deliver True Business Value
22. IBM Client Confidential – Do Not Distribute
IBM Information Management
Solution : Multi-Dimensional Clustering (MDC)
Divides the table up into ‘extents’ and ensures that each record in an extent contains the
same value in all interesting dimensions
– Extent = consecutive group of pages, big enough for efficient I/O (typically 32 pages; 4 in the e.g. below)
– Queries in all dimensions benefit
– This clustering is always maintained by DB2; it never degrades
SELECT * FROM Sales WHERE Region = SW
– 2 big block I/Os to retrieve pages containing region SW
Region – All sequential I/O
NW,2010 SW,2010 SW,2011
SELECT * FROM Sales WHERE Year = 2010
– 2 big block I/Os to retrieve pages containing year 2010
Year – All sequential I/O
Have your cake and eat it too !
Data Server Innovation to Deliver True Business Value
23. IBM Client Confidential – Do Not Distribute
IBM Information Management
MDC : Simple and Flexible Syntax
Example 1:
CREATE TABLE Sales
(YEAR DATE, REGION CHAR(12), PRODUCT CHAR(30), …
ORGANIZE BY (YEAR, REGION, PRODUCT);
Example 2:
CREATE TABLE Sales
(SALES_DATE DATE, REGION CHAR(12), PRODUCT CHAR(30),…
MONTH GENERATED ALWAYS AS ((INTEGER(DATE)/100)…
ORGANIZE BY (MONTH, REGION, PRODUCT)
For the query:
select * from sales where sales_date>”2010/03/03” and date<“2011/01/01”..
The compiler generates the additional predicates:
month>=201003 and month<=201101
Data Server Innovation to Deliver True Business Value
24. IBM Client Confidential – Do Not Distribute
IBM Information Management
Range Partitioning
Allows a single logical table to be broken up into multiple separate physical
storage objects
Each corresponds to a ‘partition’ of the table
Partition boundaries correspond to specified value ranges in a specified partition key
Examples of Benefits
Allows for partition elimination during SQL processing
Allows for optimized roll-in / roll-out processing (e.g.. minimized logging)
Allows for “divide and conquer” table management
Without Partitioning With Partitioning
Tablespace A Tablespace B Tablespace C
Tablespace 1
Table_1 Table_1.p1 Table_1.p2 Table_1.p3
Data Server Innovation to Deliver True Business Value
25. IBM Client Confidential – Do Not Distribute
IBM Information Management
Distributing, Partitioning, Clustering
CREATE TABLE ORDERS
(ORDER_ID, SHIP_DATE, REGION, CATEGORY)
IN TBSP1, TBSP2, TBSP3, TBSP4
DISTRIBUTE BY (ORDER_ID) World’s Richest
PARTITION BY (SHIP_DATE) Slice & Dice
(STARTING FROM('01-01-2010’) ENDING (‘9-31-2010’) EVERY (3 MONTHS)) Capability
MONTH GENERATED ALWAYS AS ((INTEGER(DATE)/100)
ORGANIZE BY (MONTH, REGION, PRODUCT)
Data Rows
Distribute via Hash
ORDER_ID
Partition By Range Partition By Range
SHIP_DATE SHIP_DATE
Tablespace A Tablespace B Tablespace C
Tablespace A Tablespace B Tablespace C
Part 1 Part 2 Part 3
Part 1 Part 2 Part 3
Month Month Month
Month Month Month
Region
Region
Region
Region
Region
Region
Organize By Organize By Organize By
Organize By Organize By Organize By
Product Product Product
Product Product Product
Data Server Innovation to Deliver True Business Value
26. IBM Client Confidential – Do Not Distribute
IBM Information Management
No Partitioning
Data
Data Server Innovation to Deliver True Business Value
27. IBM Client Confidential – Do Not Distribute
IBM Information Management
Distribute by Hash
Divide & Conquer Parallelism
P1 P2 P3 P4
Data Server Innovation to Deliver True Business Value
28. IBM Client Confidential – Do Not Distribute
IBM Information Management
Hash + Partition by Range - Partition Elimination
Massive Parallelism with Massive IO Reduction
P1 P2 P3 P4
2
0
1
0
2
0
1
1
Data Server Innovation to Deliver True Business Value
29. IBM Client Confidential – Do Not Distribute
IBM Information Management
Hash + Range + MDC
High density, High Value, Low IO Reads
P1 P2 P3 P4
2
0
1
0
2
0
1
1
Data Server Innovation to Deliver True Business Value
30. IBM Client Confidential – Do Not Distribute
IBM Information Management
Multi-Temperature Data Management
Increase Ability to Meet SLAs; Postpone Hardware Upgrades
Storage pools for different tiers of storage
– For range partitions, policy-based automated movement of data
HOT WARM COLD ARCHIVE
SSD RAID SAS RAID SATARAID Optim
Data
Growth
Higher performance
– Improved ability to meet SLAs while retaining greater amount of data for analysis
Lower costs
– Embrace new lower-cost storage technology
– Further reduces the cost for meeting SLAs
“The multi-temperature database management feature of DB2 V10.1 is great because the hardware world is not just
RAM and hard disks. There are many types of storage options with different I/O speeds and prices. This feature allows
administrators to make optimal use of these different devices, balancing expensive SSDs with cheaper SATA disks and
everything in between. Using SSDs for indexes and logs and a SATA array for the data, we noticed fantastic
improvements in I/O speeds, especially for synchronous reads. Additionally, the background movement of data
between the storages groups is very fast.” —Thomas Kalb, CEO ITGAIN GmbH
30 Data Server Innovation to Deliver True Business Value
31. IBM Client Confidential – Do Not Distribute
IBM Information Management
Autonomics, Workload Management
and Advanced Tooling
Intelligence and Simplicity
Data Server Innovation to Deliver True Business Value
32. IBM Client Confidential – Do Not Distribute
IBM Information Management
Adaptive Self Tuning Memory Management
DB2 9 introduced a revolutionary memory tuning system called the Self Tuning Memory Manager (STMM)
– Works on main database memory parameters
• Sort, locklist, package cache, buffer pools, and total database memory
– Hands-off online memory tuning
• Requires no DBA intervention
– Senses the underlying workload and tunes the memory based on need
– Adapts quickly to workload shifts that require memory redistribution
– Adapts tuning frequency based on workload
Interval BenefitPerPage
Interval
Tuner
Tuner
DB2
Step Tuner Clients
Entry Size
MIMO
MIMO
Control Algorithm
Control Algorithm
Y
Memory
Accurate
Greedy
Accurate
Model
Model
(Constraint) Statistics
Builder
Builder Collector
N
Fixed 4-Bit Entry
Step (Oscillation) Size DB2 UDB
Server
BenefitPerPage
Data Server Innovation to Deliver True Business Value
33. IBM Client Confidential – Do Not Distribute
IBM Information Management
STMM in action – Two databases on the same box
7000000
Second
6000000 database
stopped
5000000
Second
Memory (in 4K Pages)
database
4000000
started
3000000
2000000
1000000
0
0 10000 20000 30000 40000 50000 60000 70000
Time (in seconds)
Data Server Innovation to Deliver True Business Value
34. IBM Client Confidential – Do Not Distribute
IBM Information Management
STMM in Action – Dropping an Important Index
TPCH Query 21 - After drop index - Average times for the 10 streams
7000
Avg = 6205
6000
Reduced 63%
Time in seconds
5000
4000
Indexes Dropped
Avg = 2285
3000
2000
Avg = 959
1000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Order of execution
Data Server Innovation to Deliver True Business Value
35. IBM Client Confidential – Do Not Distribute
IBM Information Management
STMM – Comparing Different Configurations
70000
STMM also edges 63796
63302 out benchmark
60000
tuned system
Transactions Per Minute
50000
40000
STMM beats default
configuration by nearly
30000 4x
20000
16713
10000
0
Default (No tuning) Benchmark Tuned STMM Tuned
Configuration
Data Server Innovation to Deliver True Business Value
36. IBM Client Confidential – Do Not Distribute
IBM Information Management
“Time Spent” Metrics (example)
Total Time
Default Time Metrics
Bufferpool Read Wait
Bufferpool Write Wait
Direct I/O Read Wait
Direct I/O Write Wait
Lock Wait
Agent Wait
WLM Queue Wait
FCM Send Wait
FCM Receive Wait
Network Send Wait
Network Receive Wait
Log Write Wait
Log Buffer Insert Wait
Wait Times Processing / Non-Wait Time
Data Server Innovation to Deliver True Business Value
37. IBM Client Confidential – Do Not Distribute
IBM Information Management
“Component Time” Metrics (example)
Data Server Innovation to Deliver True Business Value
38. IBM Client Confidential – Do Not Distribute
IBM Information Management
Health Monitor
Context: All
Heat Alert Dashboar
Chart s d
Alerts System Database
us
tat c e
nc
e
nt
S
ge an na
ge ve
r e ce Us
a
for
m
on
s ns int
e
rA er ag pa er cti tio
ito l ng Us ry g ac ng Ma
S ca ni S
mo ck
in L P onne s gg
i
Mon atatus Criti
D ta W
ar CP
U
Di
sk
Me Lo SQ C Tr
an Lo
Name S
Context
Production 3 5
Web 1 0
eCommerce 1 0
Support 0 0
Retail 0 2
New York 0 1
Los Angeles 0 1
Accounts 1 0
Marketing 1 3
Test 1 6
Development 0 11
Data Server Innovation to Deliver True Business Value
39. IBM Client Confidential – Do Not Distribute
IBM Information Management
Meet Your Service Level Agreements
Optimize Performance with Workload Management
– Create controls ahead of time
– Override them on the fly
– Adjust to changing priorities throughout the day
Lower your costs by automating resource
allocation and utilization
– Control for both applications and users
– Establish controls based on business priority
Workload management
– Part of database engine
– Request management
– Time based resource management
Data Server Innovation to Deliver True Business Value
40. IBM Client Confidential – Do Not Distribute
IBM Information Management
Summary of Key WLM Features
DB2 Service Class
– Serves as the primary point of resource control for executing work
– Acts as point of integration with AIX WLM for work being done within database
DB2 Workload
– Serves as the primary point of control for submitters of work
– Acts as primary router of work to a specific DB2 Service Class
DB2 Threshold
– Provides limits to control behaviours of database activities based on predictive and
reactive elements
– Provides limits to control rate of concurrency for database activities
DB2 Work Action Set
– Provides ability to discriminate between different types of database activities for service
subclass mapping or for DB2 Threshold assignment
DB2 WLM Monitor and Control capabilities
– New table functions, event monitors, and stored procedures to provide monitoring and
control mechanisms for DB2 WLM
Data Server Innovation to Deliver True Business Value
41. IBM Client Confidential – Do Not Distribute
IBM Information Management
Workload Management
Enhanced Thresholds
– Rows Read
• Does not include index access
• Checked on user configurable time interval
• Database, work action set, service class, workload
– Processing Time (CPU)
• Database, work action set, service class, workload
• Calculated on a user specified check interval
– Aggregate System Temp
• Controls overall aggregate temp space within a service class
Tiered service class model
– Use in service class thresholds to remap to new class
– Processing time or number rows read
Defining thresholds on workload domain
– Estimated SQL cost, SQL rows returned, actitivity total time, SQL temp space
– Rows read
– Processing time
Bufferpool sensitivity to I/O priority
– Introduce h/m/l bufferpool priority for service class
– Pages will be swapped out based on the priority they were fetched under
Data Server Innovation to Deliver True Business Value
42. IBM Client Confidential – Do Not Distribute
IBM Information Management
Accelerate Value for New Features
Increase Ability to Meet SLAs; Lower Administration Costs
Updated Database Admin solutions:
– IBM Data Studio
– InfoSphere Data Architect
Updated Performance Mgmt solutions:
– InfoSphere Optim Performance Manager
– InfoSphere Optim Query Workload Tuner
– InfoSphere Optim Configuration Manager
Higher performance
– Immediate support for new performance features
– Enhanced Visual Explain, Access Plan Explorer and Index Advice
– Extended Insight identifies source of performance issues
Lower costs
– Immediate support for new time saving features (incl. Temporal,
Multi-Temperature Data Management & Row and Column Access Control)
– IBM solutions are integrated and consistent
42 Data Server Innovation to Deliver True Business Value
43. IBM Client Confidential – Do Not Distribute
IBM Information Management
New and Enhanced Tooling
IBM Data Studio 3.1.1
– Merges functionality from Data Studio, Optim Development Studio and Optim Database
Administrator
– Includes all functionality available in Control Center and MORE!!
– Supports DB2 Galileo features
Optim Query Workload Tuner: Tunes multiple queries in parallel
IBM Data Studio Console replaces Health Center
Optim Performance Manager: Monitors workloads and events
Workload Manager (WLM) replaces some of Query Patroller and Governor
functionality
Adding Additional Value to Advanced Enterprise Edition
43 Data Server Innovation to Deliver True Business Value
44. IBM Client Confidential – Do Not Distribute
IBM Information Management
High Availability and Extreme Scalability
Optimized for the workload
Data Server Innovation to Deliver True Business Value
45. IBM Client Confidential – Do Not Distribute
IBM Information Management
HADR now Supports Multiple Standby Servers
Increase Ability to Meet SLAs; Disaster Recovery
HADR now supports more than
one stand-by server
If Primary Server fails,
Principal Standby takes over
If Principal Standby then fails,
can switch to Auxiliary Standby
Auxiliary Standby can provide complete
offsite availability,
while maintaining speed of
local standby
45 Data Server Innovation to Deliver True Business Value
46. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 pureScale –
OLTP Workloads
Unlimited Capacity
– Buy only what you need, add capacity as your
needs grow
Application Transparency
– Avoid the risk and cost of
application changes
Continuous Availability
– Deliver uninterrupted access to your data with
consistent performance
Learning from the undisputed Gold Standard... System z
Data Server Innovation to Deliver True Business Value
47. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 pureScale Architecture
Automatic workload balancing
Cluster of DB2 members running on
Linux or Power servers
Leverages the global lock
and memory manager
technology from z/OS
Integrated Tivoli
System Automation
InfiniBand network & DB2
Cluster Services
Shared Data
Data Server Innovation to Deliver True Business Value
48. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 pureScale: Near Linear Scaling 112 Members
OLTP Workloads
81% Scalability
88 Members
87% Scalability
2, 4 and 8
64 Members
Members
91% Scalability
Over 95%
Scalability 32 Members
Over 95%
Scalability
16 Members
Over 95%
Scalability
Number of Members in the Cluster
Data Server Innovation to Deliver True Business Value
49. IBM Client Confidential – Do Not Distribute
IBM Information Management
DB2 pureScale Enhancements
Increase Ability to Meet SLAs; Easily Add or Remove Capacity
Further Improving IBM’s Shared-Disk Cluster Capability
– NEW! Workload management for DB2 pureScale
– NEW! Multiple database support
• Easy multi-tenancy
– NEW! Range partitioning support
– NEW! Additional backup/restore options
– NEW! Support for 10-gigabit Ethernet
– NEW! Support for multiple Infiniband adapters and switches
Configurable geographically-dispersed clusters
“Vormetric’s integration with DB2 pureScale GPFS provides IBM customers with a fantastic combination
of Vormetric Data Security with pureScale availability, capacity and scalability. Improved performance
and availability with data security offers our mutual customers a phenomenal solution.”
-- Todd Thiemann, Senior Director, Product Marketing Vormetric, Inc.
49
Data Server Innovation to Deliver True Business Value
50. IBM Client Confidential – Do Not Distribute
IBM Information Management
Real-Time Data Warehousing
Faster Business Decisions; More Accurate Business Decisions
Continuous feed of data
Parallel processing
Supports multiple connections
Higher performance
– Faster availability of data
– Minimal impact on query performance
– No downtime (even for large volumes of data)
Lower costs
– Costs less than solutions outside database
– Reduced infrastructure costs
“You can now continuously feed data into your data warehouse at a high rate
even whilst you are running queries against the tables in your data warehouse.
DB2 10 represents a greatly strengthened offering for the data warehouse market.”
—Ivo Grodtke, LIS.TEC GmbH
50
Data Server Innovation to Deliver True Business Value
51. IBM Client Confidential – Do Not Distribute
IBM Information Management
Active - Active Warehousing
Full database or subset
Bi-directional or uni-
directional
Integrated packaging
Multiple stand-bys
Read-Only or
Read/Write Time Delay Read/Write
Application Applications
Seamless application failover
IBM IBM IBM IBM
p e s
S rie pSeries pSeries pSe s
rie
server server se r
rve server
Storage Area Network
part1 part5 part9 part13
part2 part6 part10 part14
part3 part7 part11 part15
part4 part8 part12 part16
IBM IB M IB M IB M
IBM IBM IBM IBM
se rve r
pSeries
s erv er
pSer ie s
ser ver
pSer ies
ser ver
pSer ies
DB2 Warehouse C s erver
pSeries
server
pSerie s
serv er
pSeries
serv er
pSeries
(Standby)
Storage Area Network
Q replication Storage Area Network
part1 part5 part9 part13
part1 part5 part9 part13
part2 part6 part10 part14
part2 part6 part10 part14
part3 part7 part11 part15
part3 part7 part11 part15
Database Log buffers
part4 part8 part12 part16
part4 part8 part12 part16
DB2 Warehouse A DB2 Warehouse B
(Primary) Primary DB2 Client connection path (Standby)
DB2 Client Reroute connection path
Data Server Innovation to Deliver True Business Value 51
52. IBM Client Confidential – Do Not Distribute
IBM Information Management
Application Portability
Protection of existing investment
Data Server Innovation to Deliver True Business Value
53. IBM Client Confidential – Do Not Distribute
IBM Information Management
Move Your Applications to DB2 … as-is
Proven
Results
Use the Oracle skills you have with Simple drag and drop of schemas to
DB2 DB2
– Achieve high productivity
Applications moved to DB2 run with Integrated, cross-platform tools
full native execution
– Deliver high performance IBM can rapidly assess your
– 98%+ application code runs as-is application
Data Server Innovation to Deliver True Business Value
3. Ease of Use
54. IBM Client Confidential – Do Not Distribute
IBM Information Management
PL/SQL in DB2
Built in PL/SQL compiler
Source level debugging and profiling
Editor Data Studio
PL/SQL SQL PL
Compiler Compiler
Debugger
SQL Unified Runtime Engine Profiler
DB2 Server
Data base
Data Server Innovation to Deliver True Business Value
55. IBM Client Confidential – Do Not Distribute
IBM Information Management
Debugging PL/SQL in DB2
Data Server Innovation to Deliver True Business Value
56. IBM Client Confidential – Do Not Distribute
IBM Information Management
Oracle Compatibility Layer in DB2
– Aggressive delivery into DB2 9.7 ship vehicles
– Significant reduction in cost and risk of application migration from Oracle to DB2
• Many applications simply work as-is, with equal or better performance
• Now over 98% compatibility with DB2 10
– Prioritized Focus on ….
– Extending Reach
> Eliminating show stoppers
> More client APIs to increase number of applications which can be enabled to DB2 in an
enterprise.
– Increase “out of the box” experience
> Popular features which reduce effort significantly
> 98% → 99% compatibility => Reduction of effort by 50%
– Utilize with partners and community
– Support for Oracle Forms via multiple partners
– Support for more Oracle package via Developerworks
56 Data Server Innovation to Deliver True Business Value
57. IBM Client Confidential – Do Not Distribute
IBM Information Management
Average PL/SQL Compatibility Moves Above 98%
Easily Move from the More Expensive Oracle Database; Leverage Oracle Skills with DB2
9.7.1 SUB STRB Increase compatibility
9.7.1 UDF Parameters: INOUT Increase compatibility
Reliance Life Insurance
9.7.1 FORALL/BULK COLLECT Increase compatibility
“The total cost of ownership with 9.7.1 Improve BOOLEAN Increase compatibility
DB2 running on IBM systems is almost 9.7.1 Conditional Compilation Enhancement
half the cost of Oracle Database on 9.7.1 Basic DPF Support Broaden coverage
Sun systems.” 9.7.1 OCI Support Broaden coverage
9.7.2 UDF Parameters: DEFAULT Increase compatibility
9.7.2 Obfuscation Enhancement
Banco de Crédito del Peru
9.7.2 NCHAR, NVARCHAR, NCLOB Increase compatibility
“We switched from Oracle Database 9.7.3 NUMBER Performance Performance
to IBM DB2 and cut our costs in half, 9.7.3 Runtime “purity level” Enforcement Increase compatibility
while improving performance and 9.7.3 RATIO_TO_REPORT Function Increase compatibility
reliability of business applications.” 9.7.3 RAISE_APPLICATION_ERROR Increase compatibility
9.7.3 Small LOB Compare Increase compatibility
JSC Rietumu Banka 9.7.4 Multi-action Trigger & Update Before Trigger Increase compatibility
• Moved from Oracle Database to IBM DB2 9.7.4 Autonomous Tx Improvements Increase compatibility
• Used “compatibility features” 9.7.4 LIKE Improvements, LISTAGG Increase compatibility
• 3-30x faster query performance 9.7.4 ROW & ARRAY of ROW JDBC Support Increase compatibility
9.7.5 Pro*C Support Increase compatibility
• 200% improvement in data availability
9.7.5 Nested Complex Objects Increase compatibility
10 Local Procedure Definitions Increase compatibility
10 Local Type Definitions Increase compatibility
57
10 PL/SQL Performance Performance
* Based on internal tests and reported client experience from 28 Sep 2011 to 07 Mar 2012.
Data Server Innovation to Deliver True Business Value
58. IBM Client Confidential – Do Not Distribute
IBM Information Management
Workload Optimized Systems
One Size Does NOT fit all …..
Data Server Innovation to Deliver True Business Value
59. IBM Client Confidential – Do Not Distribute
IBM Information Management
IBM - Proven Track Record of Optimizing Systems
1950s…1960s
1990s
TPF: Airline Reservation System
IMS & s/360: Transaction & Database IMS, CICS, and DB2 Parallel Sysplex:
System High-scale Application and Data Serving
2000s
IBM Smart Analytics System
1970s...1980s
High-scale Business Intelligence and
System 38 and AS/400: Relational / XML Data Warehousing
Integrated Application and Data Serving
DB2 pureScale on PowerHA:
DB2 & S/370: Online Transaction High-Scale Database Management
Processing
WebSphere Edge Server:
High-scale Web Application Serving
Datapower:
XML & Web services appliance
Data Server Innovation to Deliver True Business Value
60. IBM Client Confidential – Do Not Distribute
IBM Information Management
Workload Optimized Right Out of the Box
• 3x to Database
throughput improvement
on brokerage application.
• System configuration
POWER7 (Model 9117-MMB)
DB2 9.7 FP1
4.6 TB SSD + 38.4 TB 15K HDD
Data Server Innovation to Deliver True Business Value
61. IBM Client Confidential – Do Not Distribute
IBM Information Management
IBM Smart Analytic System – For Operational Analytics
Leveraging DB2 10 - Coming soon
Faster Time to Value, Faster Business Results
5600
Based on System x
…Designed for business analytics workloads
Meeting clients where …Optional Solid State Disk – reducing data latency
their information is…
5710 7700
Based on System x Based on POWER7 Servers
…Cost-effective solutions for analytics …Scaling to hundreds of terabytes of data
and BI, reporting …Extract insights from untapped information
…Compact, integrated single analytics solutions
…Available for mid-market
7710
Based on Power7 Server 9600
…A single server warehousing and analytics solution Based on Based on System z
…Built on POWER7 based servers and designed for …Advanced query /workload management
production data warehouses - sizes under 10 TB, …Database designed and optimized for system
…For development and non-production use. …Disk controller, optimized to reduce data latency
Data Server Innovation to Deliver True Business Value
62. IBM Client Confidential – Do Not Distribute
IBM Information Management
Form-factor Coverage
Expert Optimised
Flexible Deployment Configurable System Warehouse Appliance
•Deployed On Platform of Choice •Highly Optimised & adaptable Platform •Fully Integrated High Performance
Analytical Appliance
•Sophisticated Architectures and workloads •Sophisticated Architectures and workloads
•Low Cost of Management
•Mixed Updates and Queries •Mixed updates and Queries
•Rapid Deployment
•Configurable/Tunable platform •Configurable/Tunable platform
•Industry Vertical Applications
•No Copy Analytics •No Copy Analytics
InfoSphere Warehouse IBM Smart Analytics System IBM Netezza
Choice of Platform Workload Optimized
Linux, Unix, Windows Simplify - Accelerate Value - Reduce Cost
DB2 or Z or Power or Linux
Customers who want
Customers with Operational Analytic requirements
a Warehouse Appliance for deep analytics
Software Only Solutions or Expert Optimized Systems
Configurable/Customisable Low Touch
Data Server Innovation to Deliver True Business Value
63. IBM Client Confidential – Do Not Distribute
IBM Information Management
Security, Data Governance &
Regulatory Compliance
A complete set of capabilities
Data Server Innovation to Deliver True Business Value
64. IBM Client Confidential – Do Not Distribute
IBM Information Management
Security, Regulatory Compliance and Data Governance
Components of a Data Compliance Offering
Leveraging DB2, Optim and Guardium
Audit updates made to data within Encryption for data at rest, backup
the database data or data within the database
• • Minimal impact to performance required
Applicable US Regs.:
• Sarbanes, HIPAA, PCI… • Encryption on disk, backups and on the
wire
Archive Strategy Creation of test databases
Corporate
• Keep data for a specified period of time • “Changing” sensitive data
• Includes audit record repository as well while maintaining “production-like”
Data Servers
as sensitive data data and referential integrity
• For development and test
Proactively detect areas Restrict and Manage access in DB2
• Individual users
of vulnerability
• Map database to business security models
• Important to ensuring
• SECADM
compliance of all other pillars
• Label Based Access Control
• Roles
• Finely Grained Access Control and Masking
• Identity Assertion and Trusted Context
Data Server Innovation to Deliver True Business Value
65. IBM Client Confidential – Do Not Distribute
IBM Information Management
Conclusion
Continued
Investment
Consistency of focus
Data Server Innovation to Deliver True Business Value
66. IBM Client Confidential – Do Not Distribute
IBM Information Management
Accelerating an Information-Led Transformation…
IBM has invested $12B in R&D and Acquisitions
Cognos SPSS
ILOG
InfoSphere
DB2 Informix
Netezza
FileNet Optim
66 Data Server Innovation to Deliver True Business Value
66
67. IBM Client Confidential – Do Not Distribute
IBM Information Management
IBM DB2 Technology Roadmap
Investment, Innovation and Industry Leading Capabilities
2011 2012 2013 2014 and Beyond
DB2 DB2 DB2 Overall
•Oracle compatibility •Temporal Query •Simple integrated Disaster • Unbreakable DB2
enhancements • Fine Grain Access Control Recovery (logical HADR) • Seamless application
• 2x Compression • SSD and PCM memory recovery
PureScale DB2 • Oracle compatibility hierarchy • Continuous application
•Geographic Cluster enhancements • Reorg free database availability
• HADR Multiple standby • Oracle compatibility • Transparent archival
• XML • Java object cache • Hibernate optimizations • Cross-database de-
• Multiple CF HCA integration duplication
Support pureScale DB2 • Virtual database and
pureScale DB2 • Online rolling upgrades tenant support
• Multiple Database •Range Partitioned Tables • Built-in encryption
• Incremental backup
Consolidation Platform • Tablespace recovery • Explicit Hierarchical Locking • no-sql, triple store, RDF etc
• Performance Improvements support
•Database as a Service • RDMA over 10Gb-E for P
for Tier 2 and below Consolidation Platform
applications • Common platform with SMAS
Consolidation Platform
• Virtualized •DbaaS for Tier 1 • Single model for Tier 1, Tier
applications 2, Tier 3 and
• Web workload
deployment • Bare metal DbaaS
deployment of pureScale
• Deployment into DB2
pureScale DB2 • Multitenancy support and more…
Data Server Innovation to Deliver True Business Value
67 67
Adaptive Compression is included only in the Storage Optimization feature, and as such is only available for: - Enterprise Server Edition and - Advanced Enterprise Server Edition Significant enhancements to DB2's industry leading compression technologies come in the new Adaptive Compression which can further reduce your storage needs. The enhancements deliver efficient compression of high amounts of new and changing data. The improved compression ratio further reduces storage needs and allows for more data in memory therefore increasing performance. The new approach used in Adaptive Compression also reduces the need for table reorganization. As a consequence, overall maintenance of compressed data is reduced, providing additional cost savings.
This chart shows features that are new in DB2 9.7. DB2 is extending its leadership with additional breakthroughs in data compression. Currently, IBM clients enjoy data compression rates of up to 83%. This translates into a storage savings of up to 50%. Actual rates vary, depending on the type of data. With the following new features, IBM expects to move this storage savings higher: - Compression of indexes - Compression of log files - Compression of temporary tables - Compression of XML data (currently only inline XML data is compressed) - Compression of Large Objects (LOBs)
SAP Warehousing workload Overview for 5 largest tables, 54GB total size Data is sorted Shows offline reorg case Additional 39% savings by Adaptive Compression after Reorg Increase in compression ratio 5.2x to 8.5x in offline reorg case
SAP Warehousing workload Overview for 5 largest tables, 54GB total size Data is sorted Shows online ADC case (Automatic Dictionary Creation) Additional 60% savings by Adaptive Compression in ADC case Increase in compression ratio 2.5x to 6.4x in ADC case
DB2 Galileo compression is far superior to Oracle for a number of reasons: In order to get the maximum compression possible a customer will need to buy Exadata and use columnar compression The standard compression in Oracle only uses page-level compression so this is not as efficient as using Table and Page level compression Data must be pre-sorted to get the best possible compression ratio – this is often not possible in a production database! Exadata requires the use of “ classic ” compression if the workloads that are running need to do updates, inserts, and deletes.
Key point: The amount of business information in XML form is already as great or greater than other forms and growing faster - failure to leverage efficiently as structured data means high cost and/or missed opportunity In 2006 IBM introduced a new generation data server with the availability of DB2 9 (formerly known as “Viper”). The explosive growth of XML based data standards in all industries means competitive advantage for those businesses that use it most effectively and efficiently. Client, policy and claims processing in Insurance; supply chain management in Retail; financial transactions and asset management in Banking; patient care in Healthcare; citizen service in Government; implementing Service Oriented Architectures (SOA) in Computing Software and Services - and many other processes across all industries - increasingly rely on information captured and exchanged in XML form. Our clients are increasingly managing XML format text documents in a content management system for proper governance and efficient use in the business process workflow. But few are realizing the full value of all the business data they possess that are in XML format. Early users of the pureXML feature of DB2 9 are taking advantage of the fact that data in XML format is well structured and can be queried via standard languages such as XPath and XQuery. By doing so they are bringing that data to bear in both transactional and analytic processes - with higher performance and lower development costs than previously possible with a relational database. The difference is that DB2 9 supports both relational (tabular) and XML (hierarchical) structures in the same database so that both can be easily, efficiently and securely managed, analyzed and delivered. Unlike other relational data servers - and previous versions of DB2 - pureXML eliminates the overhead of fitting the “square peg” XML tree structure into the “round hole” row and column relational structure. Until DB2 9, managing XML data records with a relational data servers meant decomposing the data into columns - a process known as shredding. Or by storing the entire data record in a single cell as a character large object - known as a CLOB. The CLOB approach does not cost overhead as the data records go in. But when you query these records you pay the overhead of parsing each one at runtime which can be a significant performance impact to the application. With shredding, overhead is paid up front to turn the data into a relational record that can be queried efficiently. But overhead is also paid later if the record needs to be recreated for delivery in XML format. This process also affects the fidelity of the record itself - leading to an approach that uses both shredding and CLOB methods for applications that require both performance and fidelity. This results in even more overhead to ensure the records remain in sync. The impact of pureXML is seen by a large Banking client with a requirement to update over 500,000 XML data records per day. Attempts to use a competitors relational data server failed. Using DB2 9 with pureXML, the application was able to update more than half a million data records in less than an hour. And a large Insurance client has seen the impact of pureXML to development time and cost with a 65% reduction in lines of code and more than 75% reduction in time required to develop services accessing XML data.
Xmlement returns an xml value that is an Xquery element node =>Element name is <Department ..> … </Department> => Xmlattributes constructs XML attributes from the arguments e.dept (table employee with alias e and column dept) AS attribute name “name” => Xmlagg returns an XML sequence containing an item for each non-null value in a set of XML values => xmlelement returns the element <name> … </nmame) with values form e.firstname) Returns As dept_l;ist Group by dept name
For clause iterates over: Xmlcolumn(movies.doc) returns an xml sequence of the values in the xml column DOC in the movies column LET $actors := $movies//actor assigns the actor values to the $actors variable WHERE $movie/duration > 90 looks for movies that has a duration > 90 minutes ORDER BY $ Move/@year orders the moves by years RETURN <move> {$movie/title, $actors} </movie> Returns the sequence <move> …
Overview Table partitioning will be made available by a new PARTITION BY clause on CREATE TABLE. For example: CREATE TABLE foo(a INT) IN tbsp1, tbsp2, tbsp3, tbsp4, tbsp5 PARTITION BY RANGE(a) (STARTING FROM (1) ENDING (100) EVERY (20)) This creates a table where rows with a>=1 and a<=20 are stored in table space tbsp1, rows with 21 <= a <= 40 are in table space tbsp2, etc. This functionality leads to a three level data organization scheme DISTRIBUTE BY to spread data across EEE database partitions PARTITION BY to spread data across DMS objects in one or more tablespaces ORGANIZE BY to spread data across extents within a tablespace These clauses can be combined in a single table to create more complicated partitioning schemes. This for example allows similar functionality to Informix's Hybrid functionality by combining DISTRIBUTE BY and PARTITION BY to spread data both across EEE database partitions and multiple tablespaces. Each clause includes an algorithm (for example HASH, DIMENSIONS or RANGE) after the BY to indicate how the data should be spread out. Not all clauses will support all of the algorithms but this syntax allows consistency between the clauses as well as allowing future extensions to add new data layout algorithms. DML operations against a partitioned table will yield the same results as they would for an ordinary table. Data inserted or loaded into the table will be transparently placed in the correct data partition. Updates that move data from one data partition to another will automatically work as expected. All of the usual SQL features such as triggers, constraints, etc. will be supported for partitioned tables. The user visible differences will be in capacity, performance, and availability. Data partitioned tables can contain vastly more data than an ordinary table. Tables with up to 32767 data partitions can be created. The current limit on the number of DMS objects is 4096 tablespaces each with approximately 55000 objects in them. Query processing will be enhanced to automatically eliminate data partitions based on predicates of the query, resulting in better query performance. This feature is called &quot;data partition elimination&quot;, which corresponds to &quot;fragment elimination&quot; in the Informix products and SmartScan in the Redbrick products. Many decision support queries benefit greatly from this. Some operations that currently can take a long time on a large table such as backup, will work partition by partition. Thus, it will be possible to backup one data partition of a partitioned table at a time. In future releases this will be extended to other administration operations such as reorg. Even if applications cannot function properly with a portion of the table unavailable, it will at least allow people to break one intolerably long maintenance operation into a series of smaller ones. One notable difference between data partitioned tables and non-partitioned tables (i.e. not MDC or EEE) occurs when an update where current of cursor (WCOC) operation moves the row from one partition to another. The cursor will no longer be positioned on a row. In this case, the row can be fetched to the next row position. So you can run 'update table t1 set col1 = col1 + 1' multiple times, but after the first update that changes the row position the cursor will no longer be positioned and no further WCOC operations will be allowed until a subsequent fetch positions it on a new row. This is consistent with the behavior of MPP partitioning and MDC partitioning. The table partitioning line item will provide the following features in the first release The ability to partition a table by key range into multiple tablespaces This feature includes the ability to have multiple ranges of a single table in one tablespace (each range will be in a separate DMS object). A long and short form of the CREATE TABLE syntax The short form allows easy creation of large numbers of data partitions when required. The long form supports low level control of placement when required, most likely for data skew cases. The ability to create an index on a partitioned table In this release, the index will be non-partitioned and hold references to data rows across all of the data partitions on a EEE database partition (i.e. each EEE database partition will have its own index). This means that a single DMS object will be used per EEE database partition to hold the index. Each index on the table is stored in a separate DMS object within a EEE database partition. MDC will use a non-partitioned index MDC block indexes for partitioned tables will include a data partition identifier in addition to BID. The ability to only scan the data partitions needed based on the where clause and the range definitions for each data partition This is known as data partition elimination. ALTER commands to ATTACH/DETACH tables to a partitioned table This feature allows easy roll-in/roll-out. Tables can be loaded up off-line and then attached to a live partitioned table, for easy roll-in. In addition, a data partition from a partitioned table can be detached into a stand alone table, for easy roll-out. For discussions on the concurrency rules for ATTACH and DETACH, please refer to the usage scenarios document. ALTER commands to ADD empty data partitions ALTER TABLE ... DROP PARTITION will not be supported in the first release. Use DETACH + DROP TABLE to accomplish the same result. DB2LOOK support for dumping out schemas of partitioned tables Only the long form of the syntax will be output - any tables created with the short form of the syntax will be dumped as if they had been created with the long form. RUNSTATS will generate stats from all data partitions RUNSTATS will gather statistics from all data partitions residing on the same database partition (EEE node). The ability to generate statistics from all nodes in a EEE environment is in another lineitem (see LI 1459 ) - currently statistics is only gathered on one EEE node and extrapolated. Note there is NO dependency on LI 1459 for this lineitem, we will simply expand the current algorithm to cover all data partitions within the same database partition. REORG will be supported at the table level (it won't be possible to reorg an individual data partition in this release). However, reorg of an individual partition can be achieved by detaching the partition, reorging the resulting non-partitioned table and then re-attaching the partition. Please see the Usage Scenarios document for details about this procedure. REORGCHK will be supported at the table level. Table and index statistics will be calculated based on the statistics for the whole table. REORG INDEXES ALL ALLOW READ/WRITE will not be supported for partitioned tables in the initial release. Instead only the ALLOW NO ACCESS option will be supported, making reorg indexes all available to reorg all the indexes for the partitioned table offline. Note that ALLOW NO ACCESS must be explicitly specified for REORG INDEXES ALL on a partitioned table because the default is ALLOW READ ACCESS. In addition there will be new syntax to support reorg of individual indexes as specified by the user. LOAD into a partitioned table will be supported, as well as IMPORT/EXPORT from a partitioned table. This line item does not address whether or not High Performance Unload (HPU) will be supported for data partitioned tables. Point in time ROLLFORWARD will be enhanced to handle partitioned tables much as it does indexes. All tablespaces belonging to a partitioned table will be rolled forward together. LOCK TABLE will support locking all of the data partitions via a single table lock. Partition level locks may be obtained as well to satisfy various locking requirements. All existing ALTER TABLE commands will be supported Parallel scans will be supported on multiple data partitions via a straw scan of each data partition individually. Insert/Update/Delete for data partitioned tables will be serialized just as it is for ordinary tables. Partitioning of hierarchical (typed) tables will NOT be supported in the first release. Partitioned MQTs are allowed, but you cannot do ALTER ATTACH/DETACH on them. Not all internal optimizations available on non-partitioned tables will necessarily be supported on partitioned tables (for example some of the TPCC optimizations introduced in V8 such as parameterized base tables may not be supported - the features will however work with partitioned tables just without the performance optimizations). DATALINK and XML columns will not be supported on data partitioned tables. CREATE INDEX IN <tablespace> enables user to specify a tablespace for each individual index on a partitioned table. It will override the one specified in CREATE TABLE statement or selected by the default rule for this index regardless of whether the base table's tablespace is DMS or SMS. Redistribute will be supported for partitioned tables but NOT to move data between ranges of a data partitioned table only between database partitions. Redistribute for partitioned tables has the following restrictions: the table must have an access mode of (systables.access_mode) Full Access and the table must not have outstanding ATTACHed or DETACHed partitions. This support is being implemented as part of the Fast Redistribute lineitem RENAME TABLE will rename a data partitioned table. The ALTER operations interact strongly with SET INTEGRITY. This aspect of the project has been split off as a separate line item. LI 2369 . Details Table partitioning will require the creation of two new catalog tables, SYSDATAPARTITIONS and SYSDATAPARTITIONEXPRESSION. This is because each data partition of a partitioned table will be a separate DMS object, which needs to be tracked via a tuple in SYSDATAPARTITIONS. This tuple will include the starting and ending key values for that partition's range and other information. An additional SYSDATAPARTITIONEXPRESSION catalog table will hold the details of the columns used to partition on (and in later releases the partitioning expressions). A partitioned table will have one entry in SYSTABLES , one entry in SYSDATAPARTITIONEXPRESSION per partitioning expression and n entries in a new catalog table SYSDATAPARTITIONS, where n is the number of data partitions. The PARTITION BY clause will specify the range of data values that goes into each DMS object on a database partition (the DISTRIBUTE BY clause will specify which EEE database partition the data should live on). As the data is now contained in separate DMS objects based on key range we can now eliminate one or more DMS objects from data scans based on the where clause conditions. This functionality is called &quot;data partition elimination&quot;. It corresponds to &quot;fragment elimination&quot; in the Informix products and SmartScan in the Redbrick products. Because the table is now spread out across multiple tablespaces we no longer are limited by the 4-byte record id. Each tablespace can have its own record id range and we will supplement that with a data partition identifier internally to uniquely address each row (this is much how updates to union views are handled in V8). Also because the map of what data is in what object is in the catalogs we can reasonably easily add or remove ranges of data from the table. The ALTER statements allow addition/deletion of data partitions from a partitioned table, thus solving the roll-in/roll-out requirement. The ranges of data in each data partition can be specified in one of two ways Automatically generated This mirrors the proposed MDC syntax to allow expressions in the ORGANIZE BY DIMENSIONS clause and allows expressions to specify how the ranges are derived. Although as with the current MDC version we won't support table partitioning by expressions in the first version - generated columns can be used in the meantime to support this functionality. For example CREATE TABLE t(a INT, b INT GENERATED ALWAYS AS (a/10)) IN tbsp1, tbsp2, tbsp3, tbsp4, tbsp5, tbsp6, tbsp7, tbsp8, tbsp9, tbsp10 PARTITION BY RANGE(b) (STARTING FROM (1) ENDING (1000) EVERY (100)) Would result in 10 data partitions each with 100 key values in them. i.e (a/10) >=1 (a/10) < 101 in tbsp1 (a/10) >= 101 (a/10) < 201 in tbsp2 ... (a/10) >= 901 (a/10) <=1000 in tbsp10 The starting value of the first data partition (the one in tbsp1) will be inclusive because the overall starting bound (1) was inclusive (the default). Similarly the ending bound of the last data partition (the one in tbsp10) will be inclusive because the overall ending bound (1000) was inclusive (again by default). The remaining STARTING values are all inclusive and ENDING values are all exclusive and each data partition holds n key values where n is given by the EVERY clause. We use the formula (start + every) to find the end of each data partition range. The last data partition may have less key values if the EVERY values does not divide evenly into the START/END range. Now say we specify: CREATE TABLE t(a INT, b INT GENERATED ALWAYS AS (a/10)) IN tbsp1, tbsp2, tbsp3, tbsp4, tbsp5, tbsp6, tbsp7, tbsp8, tbsp9, tbsp10 PARTITION BY RANGE(b) (STARTING FROM (1) exclusive ENDING (1000) EVERY (100)) This would result in 10 data partitions each with 100 key values in them. i.e (a/10) > 1 (a/10) <= 101 in tbsp1 (a/10) > 101 (a/10) <= 201 in tbsp2 ... (a/10) > 901 (a/10) <=1000 in tbsp10 The starting value of the first data partition (the one in tbsp1) will be exclusive because the overall starting bound (1) was exclusive. Similarly the ending bound of the last data partition (the one in tbsp10) will be inclusive because the overall ending bound (1000) was inclusive. The remaining STARTING values are all exclusive and ENDING values are all inclusive and each data partition holds n key values where n is given by the EVERY clause. Finally, if both the starting and ending bound of the overall clause are exclusive, The starting value of the first data partition (the one in tbsp1) will be exclusive because the overall starting bound (1) was exclusive. Similarly the ending bound of the last data partition (the one in tbsp10) will be exclusive because the overall ending bound (1000) was exclusive. The remaining STARTING values are all exclusive and ENDING values are all inclusive and each data partition (except the last) holds n key values where n is given by the EVERY clause. Note we are still using the formula (start + every) to find the end of each data partition range. Tables created in this manner are constrained to use numeric or date time types in their PARTITION BY columns. For an example of the every clause using a date column, refer to the definition of the table, LINEITEM, in the Usage Scenarios section. Ranges are ascending. The increment in the EVERY clause must be greater than zero. The ENDING value must be greater than or equal to the STARTING value. MINVALUE and MAXVALUE are not allowed in the Automatically Generated form of the syntax. Manually generated This is like the traditional DB2 zOS style of partitioning where high values are specified for each data partition. A new data partition is created for each data partition boundary listed in the PARTITION BY clause. For example CREATE TABLE foo(a INT, b INT GENERATED ALWAYS AS (a/10)) PARTITION BY RANGE(b) (STARTING FROM (1) ENDING(100) IN tbsp1, ENDING(200) IN tbsp2, ENDING(300) IN tbsp3, ENDING(400) IN tbsp4, ENDING(500) IN tbsp5, ENDING(600) IN tbsp6, ENDING(700) IN tbsp7, ENDING(800) IN tbsp8, ENDING(900) IN tbsp9, ENDING(1000) IN tbsp10) Would result in data partitions with the same ranges as above. Like DB2 zOS only one end of each range needs to be specified - the other end is implied from the adjacent data partition. This is felt to be much simpler for the user than forcing both ends of each range to be specified - In particular with character and float columns the other end of the range could be difficult to specify. The following types (including synonyms) are supported for use as a RANGE partitioning column: SMALLINT INTEGER INT BIGINT FLOAT REAL DOUBLE DECIMAL DEC NUMERIC NUM CHARACTER CHAR VARCHAR CHARACTER VARYING CHAR VARYING CHARACTER FOR BIT DATA CHAR FOR BIT DATA VARCHAR FOR BIT DATA CHARACTER VARYING FOR BIT DATA CHAR VARYING FOR BIT DATA DATE TIME TIMESTAMP GRAPHIC VARGRAPHIC User defined types (distinct) Here are examples of types that can appear in a range partitioned table, but are not supported for use as a range partitioning column. Other types not mentioned here because they have not yet been implemented may or may not work in a range partitioned table. User defined types (structured) LONG VARCHAR LONG VARCHAR FOR BIT DATA BLOB BINARY LARGE OBJECT CLOB CHARACTER LARGE OBJECT DBCLOB LONG VARGRAPHIC REF Varying length string for C Varying length string for Pascal The following are examples of types are not supported for use in a range partitioned table at all: XML DATALINK BOOLEAN (The BOOLEAN data type is currently only supported internally.) In the manual generated form of the syntax, multiple columns can be used as the range partitioning key. For example CREATE TABLE foo(year INT, month INT) IN tbsp1, tbsp2, tbsp3, tbsp4, tbsp5, tbsp6, tbsp7, tbsp8 PARTITION BY RANGE(year, month) (STARTING FROM (2001, 1) ENDING (2001,3) IN tbsp1, ENDING (2001,6) IN tbsp2, ENDING (2001,9) IN tbsp3, ENDING (2001,12) IN tbsp4, ENDING (2002,3) IN tbsp5, ENDING (2002,6) IN tbsp6, ENDING (2002,9) IN tbsp7, ENDING (2002,12) IN tbsp8) This results in 8 data partitions, one for each quarter in year 2001 and 2002. Note that when multiple columns are used as the table partitioning key, they are treated as a &quot;composite&quot; key (similar to composite keys in an index), in the sense that trailing columns are dependent on the leading columns. Table partitioning is multi-column not multi-dimension. In table partitioning, all columns used are part of a single dimension, like a B-tree with multiple keys. Each starting or ending value (all of the columns, together) must be specified in 512 characters or less. This limit corresponds to the size if the columns SYSDATAPARTITIONS.LOWVALUE and SYSDATAPARTITIONS.HIGHVALUE. A starting or ending values specified with more than 512 characters will result in error SQL0636N, reason code 9. Create new table P1. Load data into P1. Do any necessary data cleansing and validation. P1 is an ordinary table. You can create indexes, constraints, whatever helps you accomplish this step. Do ALTER TABLE T1 ATTACH PARTITION ... FROM TABLE P1 At this point, P1 no longer exists as a separate table. Rather, the data from P1 is now part of T1. A z-lock is acquired on P1 and data in P1 is only online when SET INTEGRITY commits. The ATTACH operation does not require that th e data in P1 be read or written. So it is a more or less instantaneous operations. At this point, the table is fully read/write accessible except for the partition that has just been ATTACHed. Running SET INTEGRITY on T1 will make the new data visible. See LI 2369 (SET INTEGRITY) for details . Once the ATTACH returns it can be committed or aborted as required but a call of SET INTEGRITY will be required to complete the operation and to bring any newly created data partitions online. ATTACH completes more or less instantaneously. All work that involves scanning or other DML on the contents of the table takes place during the SET INTEGRITY operation. This SET INTEGRITY call will do the following: - check rows satisfy the range defined on attached partition - insert into indexes keys for the attached rows - check RI/check/generated_column constraints as applicable generate identity or generated column values as applicable LOAD into staging table: CREATE TABLE dec03(.....); LOAD FROM data_file OF DEL REPLACE INTO dec03; While LOAD is running, the staging table is offline. However, since it is a separate table from the target table, all existing data is fully accessible. [ Application specific data cleansing, transformation, checking, etc. ] Depending on the application, data may or may not have been massaged prior to loading. Since it has been loaded into a staging table that is completely independent of the target table, it may be convenient to do cleansing, checking, transformation on the staging table after LOAD has completed. ATTACH the staging table to the target table: ALTER TABLE stock ATTACH PARTITION dec03 STARTING '12/01/2003' ENDING '12/31/2003'; ALTER is more or less instantanous. This operation completes in at most a few seconds. However, the data is not yet visible. Note that ALTER acquires locks on some rows in the catalogs that are necessary for compiling new queries against the target table. Thus no new queries can be compiled for this table until the ALTER is committed, releasing the locks. Existing queries do continue to run (no query draining for ATTACH) and new queries can start even before the commit if they are pre-compiled (static SQL in packages). It is recommended that COMMIT be run immediately after ATTACH to avoid any interruption in access to the data. COMMIT the ALTER COMMIT WORK; At this point, all existing data in the target table is fully availble. However, data in the newly ATTACHed partition is not yet visible. Validate the new data using SET INTEGRITY. SET INTEGRITY FOR stock IMMEDIATE CHECKED FOR EXCEPTION IN stock USE stock_ex; While SET INTEGRITY is running, all existing data in the target table is fully accessible for select/insert/update/delete. Data in the newly ATTACHed partition is not yet visible. SET INTEGRITY is potentially a long running operation. During this period, DDL and utility type operations are not allowed. For example, LOAD or ALTER TABLE ... ADD COLUMN (see list below). The operations that I am aware of in these categories are: LOAD REORG REDISTRIBUTE Datalink reconcile Alter table (e.g. add columns, ADD, ATTACH, DETACH, truncate via alter to &quot;not logged initially&quot;) Index create SET INTEGRITY completes When SET INTEGRITY completes, it will drain all queries accessing the target table. Then it will transition the state of the table to make the new data visible. This transition is more or less instantaneous (completes in a few seconds at most) but it cannot be performed until all existing queries have been drained. Summary of Data Availability at Each Phase of Roll-Out DETACH the partition to be rolled out ALTER TABLE stock DETACH PART dec01 INTO TABLE junk; DETACH drains all queries accessing the table. The DETACH operation is more or less instantaneous (completes in a few seconds at most), but it cannot be performed until all existing queries have been drained and all DDL and utility type operations have completed. All packages that reference the table will be invalidated at this time. Note that after this state transition, the table remains offline until DETACH is committed. It is recommended that COMMIT be run immediately after DETACH to avoid any interruption in access to the data. COMMIT the DETACH COMMIT WORK At this point, the DETACHed data is no longer visible in the data partitioned table. FIXME: when do you need to run SET INTEGRITY to make data visible in the target table?
Distribution clause +------ , --------+ +-- HASH --+ | | | | V | DISTRIBUTE BY --+----------+------ ( ------- col-name ----+-- ) ---------> | | ^ +- future -+ | | use | | | +-- REPLICATION -----------------------------------+ The DISTRIBUTE BY clause replaces the PARTITIONING KEY clause used in previous releases. The old PARTITIONING KEY syntax is deprecated, but will still be supported for backwards compatibility. There is no restriction on mixing this old syntax with PARTITION BY RANGE. The associated syntax for adding or dropping a partitioning via alter table is changed to ADD/DROP DISTRIBUTION KEY instead of ADD/DROP PARTITIONING KEY. The old PARTITIONING KEY syntax is deprecated, but will still be supported for backwards compatibility. There is no restriction on mixing this old syntax with PARTITION BY RANGE. >>-ALTER TABLE-- table-name --------------------------------------> .-------------------------------------------------------------------------. V .-COLUMN-. | >----------+-ADD--+-+--------+--| column-definition |-+--------------------------+-+------->< | +-| unique-constraint |-------------+ | | +-| referential-constraint |--------+ | | +-| check-constraint |--------------+ | | +-| partitioning-key-definition |---+ | | '-RESTRICT ON DROP------------------' | . . . partitioning-key-definition: .-,-----------. V | .-USING HASHING-. |-- DISTRIBUTION KEY--(---- column-name -+--)--+---------------+---| . . . DISTRIBUTE BY REPLICATION as per previous releases is only supported with MQTs. An error will be returned if it is supplied for any other type of table. This syntax change frees the PARTITIONING term for use in this project and makes all of the CREATE TABLE data layout clauses match the {DISTRIBUTE|PARTITION|ORGANIZE} BY <algorithm> pattern. Organization clause +------- , --------------+ +-- DIMENSIONS --+ | | | | V | ORGANIZE BY ---+-----------------+------ ( ----+-- col-name --------+-+- ) --+--> | | | | | | +---- , -----+ | | | | | | | | | | V | | | | +-(--- col-name-+--)-+ | | | +--KEY SEQUENCE-- - | sequence-key-spec |-----------------------+ | | +-- future use -----------------------------------------------+ This is mainly in this document for completeness (and because the syntax came out of the table partitioning meetings). The changes necessary to change MDC were put in late in V8. MDC A table can be both multi-dimensional clustered and table partitioned. In a table that is both multi-dimensional clustered and data partitioned, columns can be used both in the table partitioning range-partition-spec and in the MDC key. This is useful for achieving a finer granularity of partition + block elimination than could be acheived by either feature alone. There are also many applications where it is useful to specify different columns for the MDC than those on which the table is data partitioned. Refer to the usage scenarios for more details. Note that table partitioning is muti-column but not multi-dimension while MDC is multi-dimension.
DB2 provides the most tremendous flexibility in partitioning techniques to meet your design, performance and operational requirements. Hash partitioning, available with the Distributed Partitioning Feature (DPF), allows data to be spread across nodes allowing massively parallel IO to speed your query results. Range partitioning allows simplified Roll-In and Roll-Out for adding and removing data from the active database. Multi-dimensional Clustering (MDC) allows partitioning by like data attributes, or dimensions, providing optimized, high density, high yield access.
In this example, 4 partitions are used, but DB2 can support up to 1000. Hash partitioning divides the data using a built-in hash algorithm, and allows massively parallel access to the data. In the above example, each partition now only contains ¼ of the data. DB2 parallelizes the IO and can scale to handle your largest databases.
Range partitioning allows data to be physically segregated based upon the range of an attribute, most frequently date. Range partitioning allows an optimizer technique, referred to as pruning, to be used to limit the data accessed. For instance, in the above example a query accessing only 2006 data does not need to access 2005 data, so the 2005 data can be eliminated and not accessed. Range partitioning can be used alone, or in conjunction with other techniques, such as Hash partitioning as shown. Using both Hash and Range partitioning together further reduces the data accessed and spreads work of accessing that data across more resources, speeding query results even more.
The blocks represent Multi-dimensional Clustering (MDC) blocks. Data with like attributes, or dimensions, is partitioned into blocks, allowing fast, high density, high yield, block IOs compared with record-at-a-time access. Entire blocks of data can be read, and MDC technology assures that all records in the block contain the required attributes. MDC can be used alone or in conjunction with Hash and Range partitioning, as shown. Using a combination of Hash, Range and MDC allows division of the work among more resources and parallelism, as well as the opportunity for pruning through Range partitioning, and then further speeds access through MDC block access.
Multi-Temperature Data Management is included only in: - Enterprise Server Edition and - Advanced Enterprise Server Edition Tiered Storage is also know as Hierarchical Storage Management (HSM)
Top-level overview with everything expanded; note use of grey shading to distinguish ungrouped databases in the production group from grouped databases in the intranet group
DB2 workload manager (WLM) is a resource management and monitoring tool that is built right into the database engine. The primary client benefits include CPU control, detection/control of rogue queries (limit excessive, unexpected resource consumption) and monitoring of work on the database. The new WLM architecture is also designed with integration of external WLM products such as AIX workload manager in mind, allowing DB2 to take advantage of their capabilities (on platforms where they exist) while improving the end-to-end workload management story. WLM is simple to administer, based on workload identification and then prioritization. An excellent introductory article on DB2 workload management is available for download at http://www.ibm.com/developerworks/forums/servlet/JiveServlet/download/1116-166950-13965175-231542/Introduction%20to%20DB2%20workload%20management.pdf These workload management capabilities help ensuring proper resource allocation and utilization, which can help meet service level agreements and reduce your total cost of ownership. Now you don’t have to worry about someone hogging CPU time with a monster query, and you can ensure that high-priority work gets done first.
Rows Read Internal SQL activities, such as those initiated by the setting of a constraint or the refreshing of a materialized query table are also not counted for this condition.
Data Studio Administration of Adaptive Compression Create and alter table, index for adaptive compression Provide compression actions Update Row Compression action in control flow for new options Implementation, Deployment and Maintenance of Storage Groups Create, alter, drop storage groups Create, alter, drop tablespaces within storage groups Generate DDL script from physical database Generate DDL script from compare and synch Compare and synch to validate consistent deployment Analyze impacted objects Implementation, Deployment and Query Analysis of Temporal tables Create, alter, drop tables with temporal attributes Generate DDL script from physical database Generate Delta DDL script from compare and synch Compare and synch to validate consistent deployment Deploy script to multiple servers using deployment manager Analyze impacted objects Implementation, Deployment and Query Analysis of Row and Column access control Create, alter, drop permissions and masks for users, groups, and roles Create or promote secure functions or secure triggers Activate and deactivate Row and Column access control Generate DDL script from physical database Analyze impacted objects (e.g. objects needing rebind, made secure) Support for developing Java, C, and .NET applications against a DB2 pureScale environment Perform common administration tasks across DB2 pureScale members and CF Support for high speed unload utility Enhanced Visual Explain, Access Plan Explorer and Index advice to include jump scan and joins Optim Configuration Manager Recommendation compression savings opportunities for tables, indexes and XML objects View compression savings for all objects in the database Policy management Optim Performance Manager Storage group to tablespace alerts Storage group report with drill down to tablespace Route and remap based on data tags Integrated alerting and notification of DB2 pureScale members Seamless view of status and statistics across all DB2 pureScale members and CFs Optim Query Workload Tuner Query, Statistics, and Tuning advice for applications on DB2 pureScale systems InfoSphere Data Architect Logical modeling: system-period, business-period, or bi-temporal Transformation from Logical Modeling to Physical Modeling Physical modeling: system-period, business-period, or bi-temporal Reverse engineering from database and DDL to physical model
There will be new and improved tooling available in DB2 Galileo. This list just summarizes some of the enhancements that will be made and more details will be made available at GA time.
The high availability disaster recover (HADR) feature supports multiple standby databases. This allows you to have your data in more than two sites, providing improved data protection with a single technology. When you deploy the HADR feature in multiple standby mode, you can have up to three standby databases in your setup. You designate one of these databases as the principal HADR standby and any other standby database is an auxiliary HADR standby. Both types of HADR standby are synchronized with the HADR primary through a direct TCP/IP connection, both types support reads on standby, and both types can be configured for time-delayed log replay. In addition, you can issue a forced or non-forced take over on any standby. There are a few important distinctions between the principal and auxiliary standbys, however: IBM® Tivoli® System Automation for Multiplatforms (TSA) automated failover is only supported for the principal standby; you must issue a takeover on one of the auxiliary standbys to make one of them the primary.All of the HADR sync modes are supported on the principal standby only; the auxiliary standbys run in SUPERASYNC mode.
DB2 pureScale is a new optional DB2 feature that reduces the risk and cost of business growth by providing unlimited capacity, continuous availability, and application transparency. DB2 pureScale allows you to have multiple database servers in a system that all share a common set of disks. The system was developed in partnership with IBM Power Systems and helps DB2 both scale and always stay up. Key features of DB2 pureScale are: Unlimited Capacity Buy only what you need, add capacity as your needs grow We’ll discuss how this applies not only to your IT infrastructure but also to your licensing costs. Application Transparency Avoid the risk and cost of application changes. DB2 pureScale will help you scale without expensive and risky data changes. Continuous Availability Deliver uninterrupted access to your data with consistent performance DB2 has learned from the undisputed gold standard of reliability – System Z and based DB2 pureScale on the Z architecture that businesses have learned to trust for their most critical systems. We’ll be walking through how DB2 pureScale can help your business and how DB2 pureScale works.
Based on industry leading System z data sharing architecture, DB2 pureScale integrates IBM technologies to keep your critical systems available all the time. It includes: Automatic workload balancing to ensure that no node in the system is over loaded. DB2 will actually route transactions or connections to the least heavily used server. This workload balancing is hidden from the end user and even from applications by having the DB2 client handle all the workload balancing. The client will actually periodically check the workload levels and re-route transactions to different servers. The workload balancing can occur either at the transaction or connection level. Transaction support was added as many customers and ERP system use connection pooling and without transactional level support workloads may never be moved. DB2 pureScale is built on the most reliable UNIX system available – Power Systems. Other platforms will be available in the future DB2 and Power Systems worked very closely on DB2 pureScale to ensure that it is optimized for AIX at all levels, be it memory, networking or storage. The technology for globally sharing locks and memory is based on technology from z/OS which has a great track record of being the most reliable and scalable architecture available. Tivoli System Automation has been integrated deeply into DB2 pureScale. It is installed and configured as part of the DB2 installation process and DBAs and system administrators never even know its there. The DB2 fixpaks will even include and apply any Tivoli updates so DBAs and system administrators never need to understand another software product. The networking infrastructure leverages Infiniband and all additional clustering software is included as pat of DB2 pureScale installation. This technology has allows us to avoid many scaling problems other vendors have run into. The core of system is a shared disk architecture.
DB2 pureScale has a large technology demonstration being announced later this quarter that demonstrates the great scaling that can be achieved as more members are added. The technology can clearly scale beyond a simple 4 member configuration. This near linear scaling will allow a business to allocate as much capacity as they need without fear of the clustering technology failing them. No other vendor can demonstrate scaling to beyond a hundred members. Not all vendors are upfront with their scaling and don’t even allow scaling numbers to be published. Instead of showing the scaling they just put up a large number and never really show you what you could achieve with less hardware. Even worse, they tend to flat line at 4 servers as their architecture isn't designed to scale, or to scale without a lot of work as you add capacity. The workload that the demonstration ran on is described below (Note: only go into detail if asked) Demonstrate transparent application scaling. This was a Web commerce workload: Read-mostly, but not read-only, the application is not cluster-aware so there was no routing of of transactions to members. The original goal was to stop when the scalability hits 80%, or stop when hit triple digit number of members; whichever comes first. The results of this 112 member test shows that there is near linear scaling even out to 112 members in the cluster. Up to 64 members in the cluster, the scalability (compared to the 1 member result) is still above 90% and at 112 members the scalability was at 81%. Note that this is a validation of the architecture and includes some capabilities under development that will not be in the December GA code. Other vendors have not publicly stated what scalability they can obtain in such a large cluster with a workload that is not cluster-aware. Note that their offerings are limited to 100 nodes.
- pureScale now supports the DB2 SET WRITE command. The SET WRITE command allows a user to suspend I/O writes or to resume I/O writes for a database. Typical use of this command is for splitting a mirrored database. This type of mirroring is achieved through a disk storage system. - Cluster caching facilities (CFs) now support multiple low-latency, high-speed cluster interconnects. With multiple cluster interconnects on the CFs, you can connect each CF to more than one switch. Adding cluster interconnects, and adding a switch to a DB2 pureScale environment both improve fault tolerance. - With multiple cluster interconnects on the CFs, you can connect each CF to more than one switch. A one-switch multiple cluster interconnect configuration increases the throughput of request to CFs. A two-switch configuration helps with increased throughput and high availability. DB2 pureScale environments do not require multiple cluster interconnects. Can do geographically-dispersed clusters. It’s not a product feature per se. It is not in the official documentation. However, there is a whitepaper on it. http://www.ibm.com/developerworks/data/library/long/dm-1104purescalegdpc/index.html This article describes the geographically dispersed DB2® pureScale™ cluster (GDPC). Like the Geographically Dispersed Parallel Sysplex™ configuration of DB2 for z/OS®, GDPC provides the scalability and application transparency of a regular single-site DB2 pureScale cluster, but in a cross-site configuration that enables active/active system availability, even in the face of many types of disasters. The active/active part is important because it means that during normal operation, the DB2 pureScale members at both sites are sharing the workload between them as usual, with workload balancing (WLB) maintaining an optimal level of activity on all members, both within and between sites. This means that the second site is not a standby site, waiting for something to go wrong. Instead, the second site is pulling its weight, returning value for investment even during day-to-day operation. This article describes the prerequisites for a geographically dispersed DB2 pureScale cluster, followed by the steps to one deploy one, as well as some of the performance implications of different site-to-site distances and different workload types. The article covers the following topics: GDPC concepts GDPC infrastructure and prerequisites GDPC setup and configuration Performance factors Detailed configuration steps
Available only in DB2 Advanced Enterprise Server Edition. Easy to administer (powerful scripting capability)
Both compiles part of core DB2 engine. Produce equivalent quality runtime code. No preference. Tooling/Monitoring hooks in with run time code. So no problems.
Main Point: For more than six decades now, IBM has pioneered in developing systems that are not only optimized for performance but also maintain transactional integrity – something that is key to any large scale real time business system. As early as 1962, IBM delivered PARS (Programmable Airline Reservation System), a large scale airline reservation application and TPF (Transaction Processing Facility) as the underlying transaction operating system that became widely implemented by major airlines, credit cards, hotels, rental car reservations, police emergency response systems, and package delivery systems. In 1970, E. F. Codd of IBM Research published a paper that led to a new way for computers to manage information. Four years later, two IBMers published a paper that would become the basis for the SQL language standard. This set the stage for new, more powerful questions that could be asked of the data that lay within organizations. Continuing in its rich tradition of technology innovation, IBM has delivered top notch transactional systems built for the mainframe, such as CICS and IMS, and tight integration with the operating systems and server hardware, such as zSeries Parallel Sysplex architecture. On the AS/400 for instance, DB2 is implemented as part of the operating system itself, which support single-server and multi-server parallel processing and clustering. Into the 2000’s IBM continues to rapidly innovate to provide its customers the best of breed transactional systems optimized for the demands, expectations and evolving workloads of its customers in the 21 st century.
Steve Mills 10/27/09 091027 MILLS IOD CONFERENCE STD TEMPLATE