Ibm pure data system for analytics n200x

© 2013 IBM Corporation
IBM® PureData™ System for Analytics
N200x Technical Overview
Adriano Di Massimo
PureData for Analytics Europe IOT

© 2013 IBM Corporation2
Increasing
Variety of data
requires new techniques
Increasing
Velocity of data
requires higher performance
Increasing
Volume of data
requires growing capacity
35 ZB
by 2020
Big Data Challenges for Both Transactions and
Analytics are Increasing Demands on Data Systems
Millions of
transactions
per second
Telco subscriber
activity logging
Mobile CloudSocial Big DataCommerce
2020
50x
2010
Analytics
Billions of
devices &
sensors
Smart Meters,
RFIDs, GPS

Strategic Big Data: the future Model of Datawarehouse
Source: Top Ten Technology Trends for 2013 – Gartner Symposium Barcelona Nov 2012

IBM PureData System for Analytics (PDA)
Purpose-built analytics engine
Integrated database, server and storage
Standard interfaces
Low total cost of ownership
Speed: 10-100x faster than traditional systems
Simplicity: Minimal administration and tuning
Scalability: Peta-scale user data capacity
Smart: High-performance advanced analytics
Transforms the User Experience

Announcing a New Model!
PureData for Analytics now has TWO models
N1001 – economical, high performance and scalability
N200x – highest performance appliance to-date
PureData for Analytics continues to provide:
Fastest Time to Value on the market today
Optimized Big Data analytics performance
Simple administration for fast and agile deployment
Accelerate analytic performance using large library of analytic
functions
The new N200x model addresses these key challenges
Increased performance
Better density
Data center efficiency
PureData System for Analytics N200x

Benefits of the IBM PureData System for Analytics
The Fastest Performance of Netezza Technology to Date!
6
1 Based on a comparison of the IBM PureData System for Analytics N2001 to the IBM PureData System for Analytics N1001. The performance speed refers to the query times on both macro-analytic and mixed
workload tests as conducted in IBM engineering lab benchmarks. The N2001 query times were an average of 3x faster than those of the N1001. Individual results may vary.
2 128 GB/sec scan rate assuming an average of 4x compression across the system. Individual results may vary.
3 Capacity of IBM PureData System for Analytics N2001 compared to previous generation IBM PureData System for Analytics N1001.
4-Each N2001 rack contains 34 hot spare drives and 240 active drives for a ratio of 1 spare per 7 drives. Each N1001 rack contains 4 hot spare drives and 92 active drives for a ratio of 1 spare per 23 drives. The N2001
has 3.3x more spares per active drive. Frequency of disk related service calls expected to decrease by 70% assuming the same drive failure rates.
Accelerated Performance
of Analytic Queries
of Analytic Queries
Increased Efficiency
of your Data Center
of your Data Center
Simplicity and
Ease of Administration
Simplicity and
3X faster performance1
for Big Data analytics
128 GB/sec effective scan rate per rack2
to tackle Big Data faster
Improved system management and resilience
to spend less time managing and more time
delivering value
70% FEWER service calls
with more spare drives and faster disk regeneration4
50% greater data capacity per rack3
helps optimize data center efficiency
More capacity and less power per rack
than both Oracle and Teradata

7
of Analytic Queries
of Analytic Queries
Increase Efficiency
of your Data Center
Increase Efficiency
of your Data Center
Simplicity and
Simplicity and
delivering value

The PureData System for Analytics AMPP Architecture
PureData System for Analytics Appliance
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
S-Blades
Network
Fabric
Field Programmable Gate Array =
a blank canvas until it’s configured
Advanced
Analytics
Advanced
Analytics
LoadersLoaders
ETLETL
BIBI
Applications
Disk
Enclosures
“Lite”
Host
(IBM xSeries,
Red Hat Linux)

The PureData System for Analytics AMPP Architecture
PureData System for Analytics Appliance
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
S-Blades
Network
Fabric
Field Programmable Gate Array =
a blank canvas until it’s configured
Advanced
Analytics
Advanced
Analytics
LoadersLoaders
ETLETL
BIBI
Applications
Disk
Enclosures
“Lite”
Host
(IBM xSeries,
Red Hat Linux)
• AMPP Architecture
- Combine the benefits of both technologies:
SMP simplicity and MPP performance

Select State, Age, Gender, count(*) From MultiBillionRowCustomerTable Where BirthDate <
‘‘‘‘01/01/1960’’’’ And State in (’’’’FL’’’’, ’’’’GA’’’’, ‘‘‘‘SC’’’’, ‘‘‘‘NC’’’’) Group by State, Age, Gender Order by
State, Age, Gender
S-Blade Data Stream Processing
FPGA Core CPU Core
Decompress Project
Restrict
Visibility
SQL &
Advanced Analytics
From MultiBillionRowCustomerTableWhere BirthDate <‘‘‘‘01/01/1960’’’’
Group by State, Age, Gender
Select State, Age, Gender, count(*)
And State in (‘‘‘‘FL’’’’, ‘‘‘‘GA’’’’, ‘‘‘‘SC’’’’, ‘‘‘‘NC’’’’) Order by State, Age, Gender
From Select Where Group by
Stream via
Zone Map
From
10

Select State, Age, Gender, count(*) From MultiBillionRowCustomerTable Where BirthDate <
‘‘‘‘01/01/1960’’’’ And State in (’’’’FL’’’’, ’’’’GA’’’’, ‘‘‘‘SC’’’’, ‘‘‘‘NC’’’’) Group by State, Age, Gender Order by
State, Age, Gender
S-Blade Data Stream Processing
FPGA Core CPU Core
Decompress Project
Restrict
Visibility
SQL &
Advanced Analytics
From MultiBillionRowCustomerTableWhere BirthDate <‘‘‘‘01/01/1960’’’’
Group by State, Age, Gender
Select State, Age, Gender, count(*)
And State in (‘‘‘‘FL’’’’, ‘‘‘‘GA’’’’, ‘‘‘‘SC’’’’, ‘‘‘‘NC’’’’) Order by State, Age, Gender
Stream via
Zone Map
From
• Transparent I/O performance optimization
- Use of FPGA (streaming approach) guarantees
the highest and stable scan rate
11

CPU
Request
General Purpose
Storage
Request
Transactional System used for BI
Data Warehouse Workload
Fewer requests, lots of data manipulation
12

Results
Transactional System used for BI
Request
General Purpose
Storage
CPU
Data Warehouse Workload
Transaction systems are inefficient for data shuffling
13

Results
PureData for Analytics System
Intelligent StorageCPU
Request
Asymmetric Massively Parallel Processing
Data Warehouse Blades
Designed for Tera-scale Business Intelligence
14

Results
Netezza Performance Server™ System
Intelligent StorageCPU
Request
1% of network
traffic
2% of CPU
requirements
Asymmetric Massively Parallel Processing
Data Warehouse Blades
Highly efficient data movement
15

N200x: What’s new
16
FPGA Core CPU Core
Decompress Project
Restrict
Visibility
SQL &
Advanced Analytics
120MB/sec
500MB/sec 800 MB/sec +
480
MB/sec
N1001N200x
65 MB/sec
130 MB/sec
130 MB/sec
325 MB/sec
(2.5 drives / core)
1000 MB/sec 1000 MB/sec +
1300 MB/sec
PureData System for Analytics

How We Did it, Conceptually
17
Balanced Performance
FPGA Core CPU Core
500
MB/sec
800
MB/sec +
1 drive @
120 MB/sec
More Drives with
Faster Scan Rates
Leading to
Faster Performance
Faster FPGA Cores,
Driving Higher Performance
2.5 drives @
130 MB/sec
each
1000
MB/sec
1000
MB/sec +
CPU Core
• Analyze
FPGA Core
• Decompress
• Project
• Filter

PureData System for Analytics N1001
18
S-Blades
Disks
Memory CPU
FPGA
8 8 6 6 6 6 6
14 Blades per full rack
Each S-Blade
8 CPU Cores
8 FPGA Engines
Sized to handle 8 disks or 960 MB/sec
92 Active Data Slices deliver 11 GB/sec raw disk throughput
8
8
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA

PureData System for Analytics N200x
19
S-Blades
Disks
40 40 32 32 32 32 32
7 Blades per full rack
Each S-Blade
16 CPU Cores
16 FPGA Engines
sized to handle 40 disks or 5.2 GB/sec
240 Active Data Slices deliver 31.2 GB/sec raw disk throughput
3x More Disk
Throughput
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory CPU
FPGA
16
16

Netezza Platform Software v7.1
Highlights
Scheduler rules for WLM
Short query prioritization
Snippet Result Cache
Faster Bulk Fetching with ODBC
Password aging and expiry
nzPortal enhancements
Cryptographic Standards (s800-131a)
Support for Replication v1.5
Support for INZA 3.0
Resiliency
Faster rebalance for failed disks
Disk validation support
Large scale disk replacement
Call Home v1.0
Enhanced System Health Checks v2.2
ILMT support for Growth on Demand
Platform & OS
Client Kit support for AIX 7.1
RHEL 6.4 certification
SQL Enhancements
Multiple Schema (3-part naming)
Orphan column query
NOT IN / EXIST improvements
CASE WHEN improvements
Support 24 hour datetime
CESU-8 support
Transaction Enhancement
Truncate table in TXN
Improved view validation
Temp table enhancements
Deprecate Web Admin
ETL
ODBC loader support for INTERVAL
Netezza Performance Portal
Cryptographics standards (s800-131a)
Scheduler rules
History type AUDIT
Restrict nzPortal users
Groom dialogs
20

Directed Data Processing
21
Distribute Restrict Optimization
– Use distribution key to target scans
Transaction
history
distributed on
customer ID
Hosts

Directed Data Processing
22
Distribute Restrict Optimization
– Use distribution key to target scans
Hosts
select from
tx_hist where
custid in (1, 2)
custid = 1
custid = 1
custid = 1
custid = 1
custid = 2
custid = 2
custid = 2
custid = 2
custid = 3
custid = 3
custid = 3
custid = 3
select from
tx_hist where
custid = 3

Page Granular Zone Maps
23
October
November
Other
3 MB
where col = October
Total 12 MB
(4 x 3 MB)

Page Granular Zone Maps
24
24X finer granularity
October
November
Other
Total 12 MB
(4 x 3 MB)
Total 1 MB
(8 x 128KB)
3 MB
128 KB
where col = October

Observation
• BI/Web page generated reports create queries with limited variation
• Repeated tables, columns, restrictions
Keep intermediate results
• From simple table scans
• Using existing storage
Internal Benchmarking Results
• Up to 2.5X faster for tactical queries
25

SQL Query
• Preserves intermediate tables generated by snippets for use in
subsequent queries
• Queries do NOT have to be identical to benefit
Snippet
Snippet
Snippet
Snippet
Snippet
Snippet
Snippet
Snippet
SQL Query
Snippet
Snippet
Snippet
Snippet
Snippet
Snippet
26

ODBC Bulk Fetch Enhancements
Delivers a more competitive select performance!
‒ Eliminates expensive conversion routines when the
client and database share the same data type
‒ Nearly 4X faster for select data types!
Sample improvements:
Data Type Today NPS 7.1 Times Faster % Gain
Char(ns) 175.704 45.009 3.90 74%
Int1 101.38 54.86 1.85 46%
Int8 76.421 24.198 3.16 68%
Boolean (bit) 195.27 133.3441 1.46 31%
Double 75.684 31.271 2.42 58%
27

30
Accelerate Performance
of Analytic Queries
of Analytic Queries
of your Data Center
of your Data Center
Simplicity and
Simplicity and
delivering value

32
of Analytic Queries
of Analytic Queries
Increase Efficiency
of your Data Center
Increase Efficiency
of your Data Center
Simplicity and
Simplicity and
delivering value

Spend Less Time Managing and More Time Innovating
33
No dbspace/tablespace sizing and configuration
No redo/physical/Logical log sizing and configuration
No page/block sizing and configuration for tables
No extent sizing and configuration for tables
No Temp space allocation and monitoring
No RAID level decisions for dbspaces
No logical volume creations of files
No integration of OS kernel recommendations
No maintenance of OS recommended patch levels
No JAD sessions to configure host/network/storage
Data Experts, not
Database Experts
Easy Administration Portal
No software installation
No indexes and tuning
No storage administration

IBM Netezza Performance Portal 2.0
Consolidating WebAdmin and Portal for Simple Admin
34
Simple web user interface
– Part of the PureData System for Analytics
New functional and usability
enhancements
– Administrative Functions
• Hardware view & alerts
• Database objects administration
• User & Group management
• View active sessions
• Workload Management
• View Events
• Table skew/storage search
• Capacity Planning
– Monitor enhancements
• Usability improvements – allow to resize
monitors and mark not-monitored periods
– Customer requested improvements
• Show locks
• Monitor System Resources
• Perform System Administration
• Understand & Predict Capacity

Netezza Performance Portal 2.1
• Support for Scheduler rules
• Ability to restrict users from adding Hosts
• New panel for Resource Allocation Performance History
• Ability to view history of BAR operations
• Support for EXPLAIN command with Query History enabled
• Client field filters for Query History view
• History type AUDIT added to Query History
• IBM HTTP server replaces Apache server

Scheduler Rules for WLM
1. Replaces the Gatekeeper Scheduler
2. Ability to limit, prioritize, and abort queries
through simple rules
3. Ability to match on group, plan type, priority,
estimate, user, db, table, client info & tags
4. Great for large scale environments running in
high concurrency
5. Helps to tune out query contention resulting
from high use of disk and memory
Gatekeeper
GRASQB
36

Scheduler Rule Examples
Modifying scheduler rules:
– IF USER IS sam THEN INCREASE PRIORITY
– IF TYPE IS LOAD THEN SET PRIORITY LOW
– IF TAG IS eom THEN EXECUTE AS RESOURCEGROUP group42
– IF ESTIMATE >= 5 ESTIMATE < 12 THEN INCREASE PRIORITY
– IF CLIENT_APPLICATION_NAME IS Cognos THEN ABORT
– IF CLIENT_ACCOUNTING_STRING IN (‘weekly_report’, ‘daily_report’)
THEN SET PRIORITY HIGH
Limiting scheduler rules:
– IF TAG IS cube THEN LIMIT 1
– IF TAG IS cube USER IS sam THEN LIMIT 2
– IF TYPE IS GENERATE STATISTICS THEN LIMIT 1

Real time link between your appliance and IBM
• Automatic problem reporting
• Ongoing Inventory tracking
• Operational status and health for proactive support
Improves support efficiency, effectiveness and the client experience
• Reduces your Total Cost of Ownership (TCO)
• Reduces duration of most common support calls
• Raises our awareness of your issues sooner
• Makes support more proactive without requiring you to do more
• Helps to improve product and support quality over time
Call Home Service

How it Works
• Targeted NZEVENTs automatically run nzOpenPmr, collect data and email
IBM
• New email identifies you, appliance (identity, location and status) and fault data
• Attached diagnostics include:
+ sysmgr and eventmgr logs
+ SMART logs for disks
+ cluster logs for Host issues
+ crash stacks for core dumps (avg. size: 15 Kbytes)
• Automation opens PMR, posts diagnostic data and replies w/ PMR
Configuration and Enablement
• Requires recent NPS fixpack and functional SMTP routing
• Additional configuration in callHome.txt
+ IBM Customer (ICN)
+ Machine Type, Model and S/N
• Identity your Support contact and email alias
• nzOpenPmr configuration creates new event table entry
SAMPLE callHome.txt
# /nz/data/config/callHome.txt
# Installation-specific attributes.
customer.company = Your Business
customer.address1 = Appliance Install Address
customer.address2 = Installed City, State, Zip
customer.ICN = 1234567
contact1.name = Joe SysAdmin
contact1.phone = 1.617.555.1212
contact1.email = jsysadmin@us.company.com
contact1.cell = 1-508-555-9876
contact1.events = ALL
contact2.name = D.B. Admin
contact2.phone = +1.508.555.1212
contact2.email = dadmin@us.company.com
contact2.cell = +1.508.555.2121
system.description = Test System
system.location = Rm 122 Aisle F Slot 2
system.model = N2001-005
system.MTM = 3565 / DD0
system.serial = NZ3xxxx
system.CC = 2 char Country Code (ISO)
Call Home Service – How it Works

• Less than 5 minutes to rebalance a failed Blade
– Unmount and remount disk rather than reboot the blade
• Rebalance occurs under normal “pause” Blade
– Avoids losing any process work (Loads or queries)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
S-Blades
.
.
.
Faster Rebalance for failed Drives

Summary of competitive advantages
41
Transparent I/O performance optimization
– Use of FPGA (streaming approach) guarantees the highest and stable scan rate,
without any need of expensive performance improvement features like:
• automatic dynamic storage differentiated by data access behaviour («virtual storage»)
• «in-memory» solution or
• «columnar» storage
Specific RDMS
– Optimized software by removing all unnecessary and expensive typical OLTP
RDBMS features like:
• Log/journaling management
• Lock management
• Referential integrity feature management
AMPP Architecture
– Combine the benefits of both technologies: SMP simplicity and MPP performance
– Symmetric «Shared Nothing» Architecture has limitations:
• Frequent «bottlenecks» due to the mix of heterogenuous processes on the same physical
resources
• Risk of unbalanced use of clustered resources due to bad access configuration

Summary of competitive advantages
Workload Management
– World-class workload manager functionalities
– Maximize resource usage without complex workload management settings
Availability and Resiliency
– No need of «fallback-like» / table mirroring functionalities
• Disk availability is guaranteed by Raid1
• Zero-downtime in case of node failure is guaranteed by built-in spare S-blades
– Efficient Incremental backup avoiding complex techniques like partitioning archive
Simplicity
– Zero-tuning
• «Zone-map»: automatic anti-index approach to avoid scanning of unnecessary data for
users query
• Automatic update of data demographic statistics
• Automatic partitioning
• Ad-hoc query enabling technology
– Near-zero administration
– Data model agnostic
42

Inside the

• 8 Disk Enclosures
• 96 1TB SAS Drives (4 hot spares)
• RAID 1 Mirroring
• 14 PureData for Analytics S-Blades™
• 2 Intel Quad-Core 2+ GHz CPUs
• 4 Dual-Engine 125 MHz FPGAs
• 24 GB DDR2 RAM
• Linux 64-bit Kernel
• 2 Hosts (Active-Passive):
• 2 Quad-Core Intel 2.6 GHz CPUs
• 7x146 GB SAS Drives
• Red Hat Linux 5 64-bit
• User Data Capacity: 128 TB**
• Data Scan Speed: 145 TB/hr**
• Load Speed (per system): 5+ TB/hr
• Power Requirements: 7.6 kW
• Cooling Requirements: 7.8 kW
**: 4X compression assumed
Scales from
¼ Rack to 10 Racks
32 TB to 1.2 PB of
User Data
PureData System for Analytics Hardware Overview: Model N1001
44

PureData System for Analytics Hardware Overview: Model N200x
User Data Capacity: 192 TB*
Data Scan Speed: 450 TB/hr*
Load Speed (per system): 5+ TB/hr
Power Requirements: 7.5 kW
Cooling Requirements: 27,000 BTU/hr
* Assuming 4X compression
2 Hosts (Active-Passive)
2 6-Core Intel 3.46 GHz CPUs
7x300 GB SAS Drives
Red Hat Linux 6 64-bit
7 PureData for Analytics S-Blades™
2 Intel 8 Core 2+ GHz CPUs
2 8-Engine Xilinx Virtex-6 FPGAs
128 GB RAM + 8 GB slice buffer
Linux 64-bit Kernel
12 Disk Enclosures
288 600 GB SAS2 Drives
240 for User Data
14 for S-Blades
34 Spare
RAID 1 Mirroring
Scales from
½ Rack to 4 Racks
45

PureData System for Analytics Models
46
PureData System for
Analytics N1001
PureData System for
Analytics N200x
Blade Type HS22 HX5
CPU Cores / Blade 2 x 4 Core Intel CPUs 2 x 8 Core Intel CPUs
# Disks 96 x 3.5” / 1 TB SAS
(92 Active)
288 x 2.5” / 600GB SAS2
(240 Active)
Raw Capacity 96 TB 172.8 TB
Total Disk Bandwidth ~11 GB/s ~32 GB/s
S-Blades per Rack (cores) 14 (112) 7 (112)
S-Blade Memory 24 GB 128 GB
Rack Configurations ¼, ½, 1, 1 ½, 2 – 10 ¼, ½, 1, 2, 4 (6 and 8 rack
configs to follow)
FPGA Cores / Blade 8 (2 x 4 Engine Xilinx FPGA) 16 ( 2 x 8 Engine Xilinx Virtex
6 FPGA)
User Data / Rack * 128 TB 192 TB
* Assuming 4x Compression

New Offerings for the Entry-Level Market
47
PureData System for Analytics ‘Lite’ (Q4’13)
– Entry-Level Striper Configuration (N2002-002)
– 32 TB usable capacity
– 50% better performance than a TwinFin-3 (N1001-
002)
– Improved resiliency over TwinFin-3 with more spare
drives
IBM Netezza Platform Development Software
– Virtualized Image supporting VMWare vSphere 5.1
– Documented reference architecture and best
practices
– Install Licensing
– 16+ TB usable capacity (compressed)
– Development and Test Only

IBM Netezza Platform Development Software
Full function NPS 7.x software for
DEV and TEST only
In a fully virtualized offering
Fully supported, simple to setup,
running in minutes
Just like an appliance
Licensed per virtual server
System Limits
16 CPU cores
64GB RAM
4TB raw space (~16TB w/compression)
Host SPU SPU

IBM Announces Growth on Demand for PureData System for Analytics
Program BasicsProgram Basics
Instant UpgradeInstant Upgrade
Simple DeploymentSimple Deployment
New Offering called “Growth on Demand”
Purchase a larger system, license 50% of the
capacity and performance
Grow in easy steps
Additional capacity enabled by licensing and
software configuration
Capacity can be added, but not reduced with
this program
Provision one system
Expand through licensing
Zero impact on data center operations
49

Growth on Demand Single Rack Example
Existing part (seven such parts, one for each model)
New part : min 50% entitled capacity (both storage and performance), one for each existing part
New part : adding 12.5% extra capacity (both storage and performance), one for each PDA model size
50% capacity
100% capacity
FullRack
‘Normal’
FullRack
‘Minimumcapacity’
Add-on
Add-on
Add-on
Add-on
‘Extracapacity’parts
50

IBM DB2 Analytics Accelerator
Now even faster with N200x
The PureData System for
Analytics N200x is also the
next generation DB2 Analytics
Accelerator
Providing the same
improvements to our DB2 for
zOS customers

Big Data Meets Deep Analytics
52
Analytics without constraint

IBM Netezza Analytics Ecosystem
PureData for Analytics AMPP Platform
Software
Development
Kit
Software
Development
Kit
3rd Party
In-Database
Analytics
3rd Party
In-Database
Analytics
Netezza
In-Database
Analytics
Netezza
In-Database
Analytics
User-Defined
Extensions
(UDF,UDA,
UDTF,UDAP)
Transformations
Mathematical
Geospatial
[Esri / nzSpatial]
Predictive
Statistics
Time Series
Data Mining
Fuzzy Logix
SAS
Zementis
IBM SPSS
Language
Support
(Map/Reduce,
Java, Python,
Lua, Perl,
C, C++,
Fortran,
PMML) Mathworks
Revolution
Analytics
BI Tools
Visualization Tools
53

Integrated by Design
IBM Netezza Analytics Version 2.0
54
Netezza In-Database Analytics 2.0
Transformations
Mathematical
Geospatial
Predictive
Statistics
Time Series
Data Mining
No data movement
Analyze deep and wide data
High performance, parallel computation

Basic Math*
Permutation and
Combination*
Greatest Common
Divisor and Least
Common Multiple*
Conversion of Values*
Exponential and
Logarithm*
Gamma and Beta
Functions
Matrix Algebra+
Area Under Curve*
Interpolation Methods*
Transformations MathematicalTime Series
Linear Regression+
Logistic Regression+
Classification
Bayesian
Sampling
Model Testing
Geospatial Data Type
Geometric Functions
Geometric Analysis
Predictive Geospatial
* Fuzzy Logix
DB Lytix
capabilities
+ Netezza
Analytics and
Fuzzy Logix
DB Lytix
capabilities
Data Profiling /
Descriptive Statistics+
General Diagnostics
Statistics+
Sampling
Data prep
Pre-Built In-Database Analytics
Descriptive Statistics+
Distance Measures*
Hypothesis Testing*
Chi-Square &
Contingency Tables*
Univariate &
Multivariate
Distributions+
Monte Carlo
Simulation*
Autoregressive+
Forecasting*
Association Rules+
Clustering+
Feature Extraction+
Discriminant
Analysis*
Data Mining
Statistics

What’’’’s New in N200x: Summary
50% Greater Storage Capacity per rack
3x scan rate vs N1001 series
Improved Resiliency and Fault Tolerance
– More spare drives per cabinet
– Faster drive regeneration
– Online Firmware upgrades
NPS 7.0
– Distribute Restrict Optimization
– Page Granular Zone Maps

Catch the
Striper “Wave”
Why Upgrade to the
IBM PureData System for Analytics N2000 Series Appliance

Why Upgrade Your TwinFin System?
PureData System for Analytics N2002 provides:
The latest hardware
– 3x faster scan rates1 – 128 GB/sec effective scan rate per rack2
– 6x more memory per Blade server
– Leverage future software enhancements longer
Increased data center efficiency with 50% greater data
capacity per rack3
Improved system management & resiliency
70% fewer service calls with more spare drives and faster
disk regeneration4
Catch the Striper Wave before TwinFin comes to end of life
1 Based on a comparison of the IBM PureData System for Analytics N200x to the IBM PureData System for Analytics N1001. The performance speed refers to the query times on both macro-
analytic and mixed workload tests as conducted in IBM engineering lab benchmarks. The N200x query times were an average of 3x faster than those of the N1001. Individual results may
vary.
2128 GB/sec scan rate assuming an average of 4x compression across the system. Individual results may vary.
3 Capacity of IBM PureData System for Analytics N200x compared to previous generation IBM PureData System for Analytics N1001.
4 Each N200x rack contains 34 hot spare drives and 240 active drives for a ratio of 1 spare per 7 drives. Each N1001 rack contains 4 hot spare drives and 92 active drives for a ratio of 1
spare per 23 drives. The N200x has 3.3x more spares per active drive. Frequency of disk related service calls expected to decrease by 70% assuming the same drive failure rates.

IBM Netezza’s Market – Leading Evolution
World’s First
Data Warehouse
Appliance
World’s First
100 TB Data
Warehouse
Appliance
World’s First
Petabyte Data
Warehouse
Appliance
World’s First
Analytic Data
Warehouse
Appliance
NPS®
8000 Series
TwinFin™
with i-Class™
Advanced Analytics
NPS®
10000 Series
TwinFin™
World’s fastest and
“greenest” analytical
platform
2003 2006 2009 2010 2011 2013
PureData™
System for
Analytics
N2002

Striper Leverages the Latest Hardware
3x faster scan rate
Drives per core have gone from
1 drive @ 120 MB/sec to
2.5 drives @ 130 MB/sec
FPGA cores have gone from
500 MB/sec to 1000 MB/sec
CPU cores have gone from
800 MB/sec to 1000+ MB/sec
6x more memory per Blade
(better leveraged by NPS 7.x)
50% greater data capacity per rack

Striper vs. TwinFin
Hardware Comparison
PureData System for
Analytics N1001 (TwinFin)
N2002 (Striper)
Blade Type HS22 HX5
CPU Cores / Blade 2 x 4 Core Intel CPUs 2 x 8 Core Intel CPUs
# Disks 96 x 3.5” / 1 TB SAS
(92 Active)
288 x 2.5” / 600GB SAS2
(240 Active)
Raw Capacity 96 TB 172.8 TB
Total Disk Bandwidth ~11 GB/s ~32 GB/s
S-Blades per Rack (cores) 14 (112) 7 (112)
S-Blade Memory 24 GB 128 GB
Rack Configurations ¼, ½, 1, 1 ½, 2 – 10 entry level, ½, 1, 2, 4
FPGA Cores / Blade 8 (2 x 4 Engine Xilinx FPGA) 16 ( 2 x 8 Engine Xilinx Virtex-6 FPGA)
User Data / Rack * 128 TB 192 TB
* Assuming 4x Compression

PureData System for Analytics N2002 HW Overview
User Data Capacity: 192 TB2
Data Scan Speed: 478 TB/hr*
Load Speed (per system): 5+ TB/hr
Power Requirements: 7.5 kW
Cooling Requirements: 27,000 BTU/hr
1 Clients interested in a smaller entry point should refer to the N2002-002 model
2 Assuming 4X compression
Scales from
½ Rack to 4
Racks 1
2 Hosts (Active-Passive)
2 Intel 2.7 GHz Sandy Bridge CPUs
7x300 GB SAS Drives
Red Hat Linux 6 64-bit
7 PureData for Analytics S-Blades™
2 Intel 8 Core 2+ GHz CPUs
2 8-Engine Xilinx Virtex-6 FPGAs
128 GB RAM + 8 GB slice buffer
Linux 64-bit Kernel
12 Disk Enclosures
288 600 GB SAS2 Drives
• 240 for User Data
• 14 for S-Blades
• 34 Spare
RAID 1 Mirroring

Striper Wave Offer
Best discounting on the purchase of Striper ever!
– Must return TwinFin machine(s)
Leave the migration to us!* (estimated migration 1-2 weeks based on data and network)
– Review Migration Planning Questionnaire
– Develop Migration Plan
– Support development of test strategy
– Prepare Environment & Install tools for Data & Code Migration
– Migrate Data & Code to new appliance*
– Removal and secure disposal of TwinFin machine(s)
Most favorable financing available – Pick your Plan**
– Defer Payments for 90 days or more; or
– 0% financing with No Upfront Cost; or
– Lowest FMV Leasing Rates Available.
* Beyond 100 hours of service, IBM can provide additional fee-based migration services via IBM’’’’s Lab Service Team for test execution
support, complex environment considerations, handling for large data volumes, etc.
** With approved credit

Appliance Migration Service
Benefits
Reduce migration risks with proven
guidance and expertise
Leverage best practices & tools to
accelerate migration activities
Accelerate your ROI of new appliance
Deliverables
Migration Plan
Migrated data/code in new Appliance*
Features
Up to 100 hours of Migration Services from
IBM for one environment (20 Client Technical
Professionals/80 Lab Services)
– Project Management
– Review Migration Planning
Questionnaire
– Develop Migration Plan
– Support development of Test Strategy
– Prepare Environment & Install tools for
Data & Code Migration
– Migrate Data & Code to new appliance*
Beyond 100 hours of service, IBM can provide
additional fee-based migration services via
IBM’s Lab Service Team for test execution
support, complex environment considerations,
handling for large data volumes, etc.
Quickly migrate your old Netezza Appliance to the latest PureData System for Analytics Appliance!
* IBM will provide ETL/ Netezza connectivity, however 100 hours does not include manipulation of ETL code or enablement of newer ETL features
*100 hours does not include test execution
* Large data volumes/low capacity network may require additional fee-based Services time to complete migration
* Estimated migration 1-2 weeks based on data and network, per environment

TwinFin to Striper Summary
Better Longevity
– TwinFin has been in the field since 2009
– IBM PureData System for Analytics N2000 series appliances
have been out since February 1, 2013 – now is the time to
make the switch
– The new system is fully supported and allows you to take full
advantage of many new enhancements
Faster scan rates
Better resiliency
Greater density for data center efficiency
Appealing Financials
– Most favorable discount on Striper possible
– Financing options from IGF
– Bundled migration services

IBM Netezza Replication Services v1.5
Asynchronous, Homogeneous Replication for
PureData System for Analytics (formerly Netezza)
Simplifying Data Replication for Disaster Recovery and Scale

What’s This Replication Thing?
IBM Netezza Replication Services keeps a collection of databases
identical across multiple Netezza appliances. Our solution focuses
on replication for Disaster Recovery.
Disaster recovery: a replication use case in which failure of hardware
or software in its operational environment causes no permanent
loss of data or functionality.
Data

Two Common Approaches When NOT Using Replication
Two Common Options: Dual Feed ETL and Backup Shipping
Primary
DR Site
ETL
WAN
WAN
Full Backup
+ Incrementals
Full Restore
+ Incrementals
Dual Feed ETL
Backup Shipping

Two Common Approaches When NOT Using Replication
Dual ETL Feed Backup and Restore
Benefits
Drawbacks
Benefits
Drawbacks
Data can arrive at both systems at
roughly the same time.
Easier to “flip” DR site to be primary
site in the event of a failure.
Some processes (such as
sequences) may result in different
values.
In the event of an ETL error, bad data
can be propagated to the DR site.
Additional overhead for customer
Only changed data is moved across
the network.
Backups can later be stored as part
of backup strategy.
Offers more control over timing of DR
loads, not tied to ETL process.
Occasional full backups recommended to
ensure consistency, especially if backup
files are later used for backup storage.
Can result in very large data transfers,
especially during initial full backups.
Incremental backups do have some
impact on system performance.

Replication Requirements Targeted with Our Solution
Disaster Recovery solution for PureData Systems for Analytics
– Protect business critical data
– Meet regulatory requirements
Scalable infrastructure that supports:
– Growing user populations
– Distributed access to BI and DW applications
– Geographically dispersed user populations
– Higher levels of concurrent access for BI and DW apps
– Reduced application connection and access latencies (“put the data closer”)
70

Replication Solution Overview
Homogeneous (PDA / Netezza only)
Asynchronous, “warm stand-by” ( there is latency to the DR box)
– Synchronous commit for the source PTS
– Asynchronous transfer to the subordinate PTS, Subordinate Appliance(s)
Hybrid Replication: SQL Statement & By Value
• (Intelligence of solution decides which mode to use)
– SQL statement-level replication (preferred, default)
– Replication By-Value (when necessary)

• IBM PureData System for Analytics N200x (Striper)
• IBM PureData System for Analytics N1001 (TwinFin)
• IBM PureData System for Analytics N1000 (TwinFin)
• IBM Netezza 100 (Skimmer)
• IBM Netezza High Capacity Appliance C1000
• NEC InfoFrame DWH Appliance
Supported Appliances
72
You can upgrade to IBM Netezza release 7.1.0.x from any 6.0.x or 6.1.x release, or from an earlier release of 7.1.0.x to a later 7.1.0.x release.

IBM Netezza Replication Services - Architecture

Description of “by SQL” Replication Method
Preferred method of replication for our solution
– Master node accepts SQL Data Manipulation Language (DML) and Data
Definition Language (DDL) that update the replicated databases.
– SQL statements captured to a replication log
– Logs copied across the network to multiple Netezza nodes
– Subordinates replay the SQL
– Fewer performance implications to customer workloads (near zero impact)
• Small amount of information to log/transfer
The SQL statement that made the change
• External table files logged that are referenced by DML operations
Byte for byte identical to original imported data
• Incoming load rates for up to three simultaneous parallel loads

Description of “by Value” Replication Method
Alternative method of replicating changes
– Used when DML or DDL SQL statements are detected to potentially produce different results on the subordinate.
– Replays the rows which changed (and DDL to ensure appropriate table structure)
Steps
– On the master
• Detect non deterministic SQL DML operations.
• Mark the entire transaction as required to be replicated by the rows that changed and the DDL statements issued against
replicated databases.
• During commit processing of the transaction on the master, the set of rows which changed (inserted, updated or deleted) for
each of the tables affected by DML are captured to the replication log.
– On the subordinate
• DDL statements against replicated databases are replayed
• For each modified table, the new rows are inserted, and old rows deleted.
Requirement to log the underlying row changes to tables
– Performance impacted by waiting for rows to log to disk on source system.
– Performance = length of time required for a transaction to complete will generally be longer than the time when
replication is disabled.
This method may be optimal for some workloads compared to “by SQL”
– Session variable available to force the selection of this method when logging transactions
• SET REPLICATE_ALWAYS_BY_VALUE=ON;
nzreplshowsql command will output more details

IBM Netezza Replication Services - Roles
Subordinate:
Role in a replication set in which execution of UPDATE transactions against non-
temporary tables or sequences in a replicated database are prohibited. Temporary
table UPDATEs and persistent table SELECTs are fully supported.
Master:
Appliance that is the single source of changes to
replicated databases and to global data. The other
appliances in the replication set are subordinates.
The role of master can be changed from one appliance
to another by an administrator, typically
in response to failures and planned outages, or to
“follow the sun” across time zones.
One master and many subordinates are permitted in a replication set. A subordinate
replication host can perform query transactions for load balancing, including creating and
updating temporary tables.
Subordinate appliances can have databases outside of replication scope and they have no
write restrictions.

The Persistent Transport System (PTS)
External server collocated with every node in replication cluster
A PTS has three major purposes:
– Move data and files (synchronize transaction logs) from one node to another.
– Send control messages from one node to another.
– Act as a persistent store for recovery from failures.
PTS H/W Specs:
– 4 cores, 16GB RAM, 5TB+ of disk space, 250MB/s disk write rate for logs
– Redhat Linux 5.7+
Can Be a Virtual Machine (VM)
The New *flexible* PTS!
(Valid option as of February 2014.)
Note: we encourage customers to have a test environment, so please consider the need for
not only appliances but appropriate PTS in your test environment.

Performance Benefits of a Replicated Environment
Across the replicated cluster, the advantages of asynchronous
replication: Because applications do not have to wait for transactions
on the master to be transported and applied on target systems,
asynchronous solutions can be deployed over long distances with
(a) negligible impact on application performance, and (b) minimal
network bandwidth consumption.
On the master system, improve performance by offloading BI reporting
to one or more replication target systems.
On target systems, reduce network and database connection latencies
by storing data closer to users and client applications.
Across the replicated cluster, optimal use of network bandwidth,a
direct consequence of the "by-SQL" approach to replicating load file
and SQL statement when possible. This contrasts with other
databases which log and transmit index and data structure changes.

Replication PTS HA: The ability to add a second host into
the PTS HW to ensure if there is an issue with the host.
(Note: this requires appropriate hardware and the RedHat
Availability Add-On.)
Replication Relaxed Serializability: Replication is
compatible with the NPS feature relaxed serializability.
Replication Master Continue on PTS Error: The ability to
allow the source appliance to continue to change data
even though a replication error occurred and it can not log
to its PTS.
Reduced Restrictions: The removal of restrictions in the SQL
allowed on replicated databases.
(Sequences, Non deterministic SQL, DML which
selects from non-replicated data, Stored
procedures which manipulate timestamps, TEMP
tables now work identically when replication is
enabled vs. disabled)
Increased Resiliency, and Compatibility with Customer Workloads
IBM Netezza Replication Services v1.5

NPS v7.1 is a Prereq for Replication v1.5
80
Highlights
Scheduler rules for WLM
Short query prioritization
Faster Bulk Fetching with ODBC
Password aging and expiry
nzPortal enhancements
Cryptographic Standards (s800-131a)
Support for Replication v1.5
Support for INZA 3.0
Resiliency
Faster rebalance for failed disks
Disk validation support
Large scale disk replacement
Call Home v1.0
Enhanced System Health Checks v2.2
ILMT support for Growth on Demand
Platform & OS
Client Kit support for AIX 7.1
RHEL 6.4 certification
SQL Enhancements
Multiple Schema (3-part naming)
Orphan column query
NOT IN / EXIST improvements
CASE WHEN improvements
Support 24 hour datetime
CESU-8 support
Transaction Enhancement
Truncate table in TXN
Improved view validation
Temp table enhancements
Deprecate Web Admin
ETL
ODBC loader support for INTERVAL
Netezza Performance Portal
Cryptographics standards (s800-131a)
Scheduler rules
History type AUDIT
Restrict nzPortal users
Groom dialogs

New Features in NPS 7.1 / Replication 1.5
WHAT IS IT
– A system parameter (replContinueOnLogError) in the replc.cfg file.
HOW IT WORKS
– False (default): If a PTS error occurs while capturing the transaction log, the master aborts any active
transaction.
– True: Enables the master to continue processing transactions, regardless of the logging error, but
replication stops so that loads can continue. The master node enters a "continue on error" state, where
write workloads continue even though they are not recorded in the replication log. Because the
transaction log is then invalid due to missing data, you must re-synchronize all nodes after resolving
the PTS issues.
HOW TO RECOVER
– To recover from the replication suspension that results from the "master continue on error" feature,
you must follow the backup and restore procedure. First, run the nzreplanalyze command to generate
a directive file for synchronization and progress the master node from "continue on error" to a
suspended state. Then, use nzreplbackup to create backup and activate master node. Finally, use
nzreplrestore to restore the replication data to the subordinate(s).
*No other database has this configuration setting!
Master Continue on Error

New Features in NPS 7.1 / Replication 1.5
As of NPS 7.1 and Replication version 1.5, customers can utilize the "relaxed
serializability" setting in NPS on replication databases!
– This functionality utilizes an invisibility list. The invisibility list on the master is replicated
for use on the subordinate.
– There are no constraints around using this setting on the master or subordinate in
replication environments.
– To be clear, the serial execution on the subordinate did not change from the prior
replication release but now it has the invisibility list to "see" the appropriate state of the
database.
– Its worth noting that the appliances behave the same way with relaxed serializability
regardless of replication being turned on or off.
NPS Configuration Notes (A best practice is to use it at a session level.)
– It can be set system wide (globally). This requires a stop and start of the appliance.
– It can be set with a session variable.
Relaxed Serializability Support
NOTE: customers need to know what is occurring to turn serializability to false. Therefore, it is
a best practice to utilize it in session scope (as opposed to globally).
NPS Feature will be documented as of NPS 7.1 for the first time

Replication Reduced Restrictions
Reduced restrictions
– Key software development project since January 2013
Things that now work fine with replication
– SEQUENCES
– Non deterministic SQL (ie. LIMIT 5, Random(), Window functions)
– DML which selects from non-replicated data (system tables, databases)
– Stored procedures which manipulate timestamps
– Session scope temporary tables and variables
- TEMP tables now work identically when replication is enabled vs disabled
– Transactions larger than 300KB of SQL statements now supported
– UDF, UDTF and UDA

Features
This QuickStart includes the following activities:
Install the 10 Gb NIC cards in the Netezza
appliances, establish and validate connectivity
with replication hardware and Netezza
appliance.
Install and configure a basic Netezza
Replication Software Solution from one
Netezza source to one target.
Provide information sharing on how to best use
and leverage the Netezza Replication Solution.
Conduct a planning workshop to document
disaster and recovery scenarios based on the
requirements.
The scope is limited to one Netezza source and
one target. Additional nodes can be supported and
quoted separately.
The site survey / pre-engagement checklist is
reviewed and completed by the client before any
IBM resources come on-site.
Deliverables
Installation Report
Disaster and Recovery Scenarios Document
Ensure your solution is implemented efficiently with low risk
Benefits
Get a basic replication solution installed and
configured quickly realizing your solution
ROI faster
Leverage IBM deep product expertise to
define optimum disaster recovery solutions
to satisfy your requirements
Obtain a replication solution foundation to
protect one of your most important assets,
your data!
Backed by world-class industry and
product experts in deploying
Information Management Software
Duration
4 weeks
Replication QuickStart Offering

Announcement
http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS214-055&appname=USN
Fix Central
http://www-
933.ibm.com/support/fixcentral/swg/selectFixes?product=ibm/Information+Management/Netez
za+NPS+Software+and+Clients&release=NPS_7.1.0&platform=All&function=all
Knowledge Center
http://www-01.ibm.com/support/knowledgecenter/
Replication Services
https://w3-connections.ibm.com/communities/community/NetezzaReplication
Netezza Developer Network download site:
https://www14.software.ibm.com/webapp/iwm/web/reg/pick.do?source=swg-im-
ibmndn&lang=en_US
Contacts
Doug Dailey, Netezza Product Manager (NPS), douglasd@us.ibm.com
Chris Gerlt, Netezza Product Manager (Replication), chris.gerlt@us.ibm.com
Questions about NPS 7.1 & Replication 1.5

© International Business Machines Corporation 2014
International Business Machines Corporation New Orchard Road Armonk, NY 10504
IBM, the IBM logo, PureSystems, PureFlex, PureApplication, PureData and ibm.com are trademarks of International Business Machines Corporation,
registered in many jurisdictions worldwide.
A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml
All rights reserved.

Ibm pure data system for analytics n200x

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a Ibm pure data system for analytics n200x

Semelhante a Ibm pure data system for analytics n200x (20)

Mais de IBM Sverige

Mais de IBM Sverige (20)

Último

Último (20)

Ibm pure data system for analytics n200x