With over 22,000 transactions processed every second, your mainframe IMS is a critical source of data for the cloud data warehouses that feed analytics, customer experience or regulatory initiatives. However, extracting data from mainframe IMS can be time-consuming and costly, leading to the exclusion of IMS data from cloud data warehouses all together – and leaving valuable insights unseen.
Never ignore or manually extract mainframe IMS data again. In this on-demand webcast, you will learn how Connect CDC enables your team to develop integrations quickly and easily between mainframe IMS and cloud data warehouses in the most cost-effective way possible.
2. Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your
computer speakers.
• If you need technical assistance with the web
interface or audio, please reach out to us using
the chat window.
Questions Welcome
• Submit your questions at any time during the
presentation using the chat window.
• We will answer them during our Q&A session
following the presentation.
Recording and slides
• This webcast is being recorded. You will receive
an email following the webcast with a link to
download both the recording and the slides.
3. 3
Agenda
• The Rise of Cloud Data Warehouse (CDW)
• Value of mainframe IMS
• Challenges of using IMS data in CDWs
• Best Practices for IMS in CDWs
• Q & A
4. 4
The Rise of Cloud Data Warehouse (CDW)
50% using CDW to support reporting and analytics
2x increase in adoption from 2018 to 2019
#1 need is to improve how data is integrated in CDW
Source: TDWI
5. CDWs Fuel Strategic Data
Initiatives
Modernization
Leverage scalability and elasticity of
the cloud
Handle large volumes of data with
increased data storage
Improve speed and performance
Better use of existing storage
Accelerating Insights
Improve access to data from across
organization for projects
Real-time data availability for line
of business, data scientists, and
engineers
Self-service data capabilities
Enable advanced analytics and ML
projects
5
6. IMS, driving
transaction data
IMS continues to fulfill
important roles in relation to
many business requirements
95%
of Fortune 1000 companies have
heritage in IMS
50B+
IMS transactions per day
15million
gigabytes of production data
serving over 200 million users
per day
6
IMS Then and Today
IMS Then and Today
7. Top Challenges of using IMS data in CDWs
7
• Communication between mainframe
and analytics teams
• IMS organization and data types
• Performance and throughput
8. 8
Bridging the Divide
• Realize that they are going to take your data –
make the best of it
• Be able to translate mainframe-speak to
distributed / cloud
• Mentor them on working with copybooks and
data types
• Help them understand IMS data structures
and keys
Enable your team’s success from IMS to CDW
9. 9
Understanding IMS Terminology
• RECONs – Recovery Control Datasets
• OLDS – Online Log Datasets (active logs)
• SLDS – System Log Datasets (archive logs) – Pronounced ‘Slids’
• DBD – Database Descriptor (schema)
• PSB – Program Specification Block (database view)
• Segment – Record within a Database
• IMS Full Function - Most versatile databases
• IMS Fastpath – Data Entry Databases (DEDBs) - Speed with limited feature set
• x’99’s – CDC type IMS Log Records
• BMP – Batch programs running in an IMS system (similar to Db2 batch)
• DLI Batch – Standalone IMS batch programs – Databases cannot be shared with Online
• Checkpoint – IMS Commits for BMPs (x’41’ log records)
• SYNCs – IMS Commits for Online transactions (x’37’ log records)
• Rollbacks – ROLB/ROLS - IMS Rollbacks for all transaction types (x’38’ log records)
10. 10
Common IMS Data Challenges
• Code Page Translation
• Invalid Data
— Non-numeric data in numeric fields
— Binary zeros in packed fields (or any
field)
— Invalid data in character fields
• Dates
— Must be decoded / validated if target
Column is DATE or TIMESTAMP
— May require knowledge of Y2K
implementation
— Allow extra time for date intensive
applications
• Repeating Groups
— Sparse arrays
— Number of elements
— Will probably be de-normalized
• Redefines
• Packed
• Binary / 'Special' Fields
— Common in older applications developed in
1970s / 80s
— Generally requires application specific
translation
12. 12
Design -> IMS to Cloud
• De-normalized / minimal normalization
• Still requires transformation (dates, binary values, etc.)
• Good news → IMS structure already setup for the cloud
Cust
Order
Line
Item
Key Data
Cust#
Key Data Data Data Data Data Data
Order# Cust# Line # Line#
{ "company_name" : "Acme",
"cust_no" : "20223",
"contact" :{ "name" : "Jane Smith",
"address" : "123 Maple Street",
"city" : "Pretendville",
"state" : "NY",
"zip" : "12345" }
}
{ "order_no" : "12345",
"cust_no" : "20223",
"price" : 23.95,
"Lines" : { "item" : "Widget1",
"qty" : "6",
“cost” : “2.43”
"item : “Widge2y"
"qty" : "1",
"cost" : "9.37"
},
}
13. 13
Sample IMS CDC Record in JSON Format
{"object_name":"IMSDB01.SEG02
"stck":"d4c4b51993db0000",
"timestamp":"2018-08-12T11:11:18Z",
"change_op":"U",
"seq":"2",
"parent_key":{
"seg1.key1":12345
},
"after_image":{
"fname":"MARY",
"lname":"JOHNSON",
"city":"CHICAGO",
"amount":"4087.66"
},
"before_image":{
"fname":"MARY",
"lname":"JOHNSON",
"city":"CHICAGO",
"amount":"2964.32"
}
}
14. 14
Data Collection Overkill
• Major red flag
• You need to start small
• Minimize data volume
• Deploy in small increments
— Realize success early
— Adjust infrastructure
— Predictable costs & duration
• Adapt easier to new targets
IMS Db2 z/OS Oracle Db2 LUW SQL Server VSAM
In “the Cloud”
Everything
Does not have to be
15. • Engine driven by a configuration script
• Designed to function as a CDC utility
• No source to target mapping
• Fast / Scalable
IMS CDC via Kafka Streaming
15
16. 16
Streaming Performance Aspects
• Throughput / latency primarily depends on speed of the target
• Fortunately, streaming platforms are very fast targets
Top items affecting performance / throughput:
— Message Size
— IMS CDC streaming → transaction size and arrival rate
— Target replication scaling factor → affects acknowledgement response
— IMS initial loads → overall data volume
— Streaming platform configuration → must be tuned for source workload
17. 17
Message Size
• Small messages perform better than larger ones (a bit obvious)
• It boils down to the number of bytes per message
• IMS segments can be large (redefines, arrays, etc.)
• Updates can contain before and after images
So...What to Do?
• Suggest Avro as the Target Data Format
— A condensed version of JSON
— JSON typically used for data validation only → can be easily read
— Avro messages roughly the size of source segment (x 2 for updates)
• Reduce the number of fields in the CDC message
• Evaluation the requirement for publishing before images of updates
18. Key Monitoring Metrics
Operational
• Component down
unexpectedly
• Latency → how far behind
you are in publishing
• Latency → when did you last
hear from the engine(s)
Informational
• Overall throughput →
records / bytes per second
• Workload patterns / peak
transaction arrival rate
• Number of CDC records by
database / segment
• Number of transactions
• zIIP offload statistics
18
19. • Engine driven by a configuration script
• Designed to function as a CDC utility
• No source to target mapping
• Fast / Scalable
IMS CDC via Kafka Streaming
19
20. 20
Best Practices
• Keep it simple & efficient → minimize the number
of moving parts
• Avoid over scaling → start with basic configuration
& work up
• Keep full extracts / initial loads on standby
— Need if ‘point of no return’ reached
— Must be able to run against live databases
• Discovery / planning
— Data sources → CDC and bulk loads
— Latency requirements
— Data volume → CDC and bulk loads
— Peak transaction arrival rate
• Ensure proper monitoring / alerts are set
• Goal → ‘set & forget’ deployment
Technical approach.
21. 21
Best Practices
• Avoid data collection overkill
• Help bridge the great divide
• Approach with a comprehensive strategy
• Involve the business unit(s) from the beginning
• Be aware of application release cycle
Business approach.
22. The Connect CDC Difference
• Multiple IMS capture options
• Highly scalable capture and apply
• Asynchronous capability for replication as well as real-time
replication capabilities for IMS to cloud data warehouses and
streaming frameworks such as Kafka
• Supports IMS data sharing environments
• No impact on IMS database
22