SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Real-Time Streaming: Move IMS Data
to Your Cloud Data Warehouse
Scott Quillicy
Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your
computer speakers.
• If you need technical assistance with the web
interface or audio, please reach out to us using
the chat window.
Questions Welcome
• Submit your questions at any time during the
presentation using the chat window.
• We will answer them during our Q&A session
following the presentation.
Recording and slides
• This webcast is being recorded. You will receive
an email following the webcast with a link to
download both the recording and the slides.
3
Agenda
• The Rise of Cloud Data Warehouse (CDW)
• Value of mainframe IMS
• Challenges of using IMS data in CDWs
• Best Practices for IMS in CDWs
• Q & A
4
The Rise of Cloud Data Warehouse (CDW)
50% using CDW to support reporting and analytics
2x increase in adoption from 2018 to 2019
#1 need is to improve how data is integrated in CDW
Source: TDWI
CDWs Fuel Strategic Data
Initiatives
Modernization
Leverage scalability and elasticity of
the cloud
Handle large volumes of data with
increased data storage
Improve speed and performance
Better use of existing storage
Accelerating Insights
Improve access to data from across
organization for projects
Real-time data availability for line
of business, data scientists, and
engineers
Self-service data capabilities
Enable advanced analytics and ML
projects
5
IMS, driving
transaction data
IMS continues to fulfill
important roles in relation to
many business requirements
95%
of Fortune 1000 companies have
heritage in IMS
50B+
IMS transactions per day
15million
gigabytes of production data
serving over 200 million users
per day
6
IMS Then and Today
IMS Then and Today
Top Challenges of using IMS data in CDWs
7
• Communication between mainframe
and analytics teams
• IMS organization and data types
• Performance and throughput
8
Bridging the Divide
• Realize that they are going to take your data –
make the best of it
• Be able to translate mainframe-speak to
distributed / cloud
• Mentor them on working with copybooks and
data types
• Help them understand IMS data structures
and keys
Enable your team’s success from IMS to CDW
9
Understanding IMS Terminology
• RECONs – Recovery Control Datasets
• OLDS – Online Log Datasets (active logs)
• SLDS – System Log Datasets (archive logs) – Pronounced ‘Slids’
• DBD – Database Descriptor (schema)
• PSB – Program Specification Block (database view)
• Segment – Record within a Database
• IMS Full Function - Most versatile databases
• IMS Fastpath – Data Entry Databases (DEDBs) - Speed with limited feature set
• x’99’s – CDC type IMS Log Records
• BMP – Batch programs running in an IMS system (similar to Db2 batch)
• DLI Batch – Standalone IMS batch programs – Databases cannot be shared with Online
• Checkpoint – IMS Commits for BMPs (x’41’ log records)
• SYNCs – IMS Commits for Online transactions (x’37’ log records)
• Rollbacks – ROLB/ROLS - IMS Rollbacks for all transaction types (x’38’ log records)
10
Common IMS Data Challenges
• Code Page Translation
• Invalid Data
— Non-numeric data in numeric fields
— Binary zeros in packed fields (or any
field)
— Invalid data in character fields
• Dates
— Must be decoded / validated if target
Column is DATE or TIMESTAMP
— May require knowledge of Y2K
implementation
— Allow extra time for date intensive
applications
• Repeating Groups
— Sparse arrays
— Number of elements
— Will probably be de-normalized
• Redefines
• Packed
• Binary / 'Special' Fields
— Common in older applications developed in
1970s / 80s
— Generally requires application specific
translation
Best Practices
12
Design -> IMS to Cloud
• De-normalized / minimal normalization
• Still requires transformation (dates, binary values, etc.)
• Good news → IMS structure already setup for the cloud
Cust
Order
Line
Item
Key Data
Cust#
Key Data Data Data Data Data Data
Order# Cust# Line # Line#
{ "company_name" : "Acme",
"cust_no" : "20223",
"contact" :{ "name" : "Jane Smith",
"address" : "123 Maple Street",
"city" : "Pretendville",
"state" : "NY",
"zip" : "12345" }
}
{ "order_no" : "12345",
"cust_no" : "20223",
"price" : 23.95,
"Lines" : { "item" : "Widget1",
"qty" : "6",
“cost” : “2.43”
"item : “Widge2y"
"qty" : "1",
"cost" : "9.37"
},
}
13
Sample IMS CDC Record in JSON Format
{"object_name":"IMSDB01.SEG02
"stck":"d4c4b51993db0000",
"timestamp":"2018-08-12T11:11:18Z",
"change_op":"U",
"seq":"2",
"parent_key":{
"seg1.key1":12345
},
"after_image":{
"fname":"MARY",
"lname":"JOHNSON",
"city":"CHICAGO",
"amount":"4087.66"
},
"before_image":{
"fname":"MARY",
"lname":"JOHNSON",
"city":"CHICAGO",
"amount":"2964.32"
}
}
14
Data Collection Overkill
• Major red flag
• You need to start small
• Minimize data volume
• Deploy in small increments
— Realize success early
— Adjust infrastructure
— Predictable costs & duration
• Adapt easier to new targets
IMS Db2 z/OS Oracle Db2 LUW SQL Server VSAM
In “the Cloud”
Everything
Does not have to be
• Engine driven by a configuration script
• Designed to function as a CDC utility
• No source to target mapping
• Fast / Scalable
IMS CDC via Kafka Streaming
15
16
Streaming Performance Aspects
• Throughput / latency primarily depends on speed of the target
• Fortunately, streaming platforms are very fast targets
Top items affecting performance / throughput:
— Message Size
— IMS CDC streaming → transaction size and arrival rate
— Target replication scaling factor → affects acknowledgement response
— IMS initial loads → overall data volume
— Streaming platform configuration → must be tuned for source workload
17
Message Size
• Small messages perform better than larger ones (a bit obvious)
• It boils down to the number of bytes per message
• IMS segments can be large (redefines, arrays, etc.)
• Updates can contain before and after images
So...What to Do?
• Suggest Avro as the Target Data Format
— A condensed version of JSON
— JSON typically used for data validation only → can be easily read
— Avro messages roughly the size of source segment (x 2 for updates)
• Reduce the number of fields in the CDC message
• Evaluation the requirement for publishing before images of updates
Key Monitoring Metrics
Operational
• Component down
unexpectedly
• Latency → how far behind
you are in publishing
• Latency → when did you last
hear from the engine(s)
Informational
• Overall throughput →
records / bytes per second
• Workload patterns / peak
transaction arrival rate
• Number of CDC records by
database / segment
• Number of transactions
• zIIP offload statistics
18
• Engine driven by a configuration script
• Designed to function as a CDC utility
• No source to target mapping
• Fast / Scalable
IMS CDC via Kafka Streaming
19
20
Best Practices
• Keep it simple & efficient → minimize the number
of moving parts
• Avoid over scaling → start with basic configuration
& work up
• Keep full extracts / initial loads on standby
— Need if ‘point of no return’ reached
— Must be able to run against live databases
• Discovery / planning
— Data sources → CDC and bulk loads
— Latency requirements
— Data volume → CDC and bulk loads
— Peak transaction arrival rate
• Ensure proper monitoring / alerts are set
• Goal → ‘set & forget’ deployment
Technical approach.
21
Best Practices
• Avoid data collection overkill
• Help bridge the great divide
• Approach with a comprehensive strategy
• Involve the business unit(s) from the beginning
• Be aware of application release cycle
Business approach.
The Connect CDC Difference
• Multiple IMS capture options
• Highly scalable capture and apply
• Asynchronous capability for replication as well as real-time
replication capabilities for IMS to cloud data warehouses and
streaming frameworks such as Kafka
• Supports IMS data sharing environments
• No impact on IMS database
22
Questions
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse

Mais conteúdo relacionado

Semelhante a Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse

AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCS
Splunk
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
Databricks
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Semelhante a Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse (20)

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCS
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 

Mais de Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Precisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
Precisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
Precisely
 

Mais de Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse

  • 1. Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse Scott Quillicy
  • 2. Housekeeping Webcast Audio • Today’s webcast audio is streamed through your computer speakers. • If you need technical assistance with the web interface or audio, please reach out to us using the chat window. Questions Welcome • Submit your questions at any time during the presentation using the chat window. • We will answer them during our Q&A session following the presentation. Recording and slides • This webcast is being recorded. You will receive an email following the webcast with a link to download both the recording and the slides.
  • 3. 3 Agenda • The Rise of Cloud Data Warehouse (CDW) • Value of mainframe IMS • Challenges of using IMS data in CDWs • Best Practices for IMS in CDWs • Q & A
  • 4. 4 The Rise of Cloud Data Warehouse (CDW) 50% using CDW to support reporting and analytics 2x increase in adoption from 2018 to 2019 #1 need is to improve how data is integrated in CDW Source: TDWI
  • 5. CDWs Fuel Strategic Data Initiatives Modernization Leverage scalability and elasticity of the cloud Handle large volumes of data with increased data storage Improve speed and performance Better use of existing storage Accelerating Insights Improve access to data from across organization for projects Real-time data availability for line of business, data scientists, and engineers Self-service data capabilities Enable advanced analytics and ML projects 5
  • 6. IMS, driving transaction data IMS continues to fulfill important roles in relation to many business requirements 95% of Fortune 1000 companies have heritage in IMS 50B+ IMS transactions per day 15million gigabytes of production data serving over 200 million users per day 6 IMS Then and Today IMS Then and Today
  • 7. Top Challenges of using IMS data in CDWs 7 • Communication between mainframe and analytics teams • IMS organization and data types • Performance and throughput
  • 8. 8 Bridging the Divide • Realize that they are going to take your data – make the best of it • Be able to translate mainframe-speak to distributed / cloud • Mentor them on working with copybooks and data types • Help them understand IMS data structures and keys Enable your team’s success from IMS to CDW
  • 9. 9 Understanding IMS Terminology • RECONs – Recovery Control Datasets • OLDS – Online Log Datasets (active logs) • SLDS – System Log Datasets (archive logs) – Pronounced ‘Slids’ • DBD – Database Descriptor (schema) • PSB – Program Specification Block (database view) • Segment – Record within a Database • IMS Full Function - Most versatile databases • IMS Fastpath – Data Entry Databases (DEDBs) - Speed with limited feature set • x’99’s – CDC type IMS Log Records • BMP – Batch programs running in an IMS system (similar to Db2 batch) • DLI Batch – Standalone IMS batch programs – Databases cannot be shared with Online • Checkpoint – IMS Commits for BMPs (x’41’ log records) • SYNCs – IMS Commits for Online transactions (x’37’ log records) • Rollbacks – ROLB/ROLS - IMS Rollbacks for all transaction types (x’38’ log records)
  • 10. 10 Common IMS Data Challenges • Code Page Translation • Invalid Data — Non-numeric data in numeric fields — Binary zeros in packed fields (or any field) — Invalid data in character fields • Dates — Must be decoded / validated if target Column is DATE or TIMESTAMP — May require knowledge of Y2K implementation — Allow extra time for date intensive applications • Repeating Groups — Sparse arrays — Number of elements — Will probably be de-normalized • Redefines • Packed • Binary / 'Special' Fields — Common in older applications developed in 1970s / 80s — Generally requires application specific translation
  • 12. 12 Design -> IMS to Cloud • De-normalized / minimal normalization • Still requires transformation (dates, binary values, etc.) • Good news → IMS structure already setup for the cloud Cust Order Line Item Key Data Cust# Key Data Data Data Data Data Data Order# Cust# Line # Line# { "company_name" : "Acme", "cust_no" : "20223", "contact" :{ "name" : "Jane Smith", "address" : "123 Maple Street", "city" : "Pretendville", "state" : "NY", "zip" : "12345" } } { "order_no" : "12345", "cust_no" : "20223", "price" : 23.95, "Lines" : { "item" : "Widget1", "qty" : "6", “cost” : “2.43” "item : “Widge2y" "qty" : "1", "cost" : "9.37" }, }
  • 13. 13 Sample IMS CDC Record in JSON Format {"object_name":"IMSDB01.SEG02 "stck":"d4c4b51993db0000", "timestamp":"2018-08-12T11:11:18Z", "change_op":"U", "seq":"2", "parent_key":{ "seg1.key1":12345 }, "after_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"4087.66" }, "before_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"2964.32" } }
  • 14. 14 Data Collection Overkill • Major red flag • You need to start small • Minimize data volume • Deploy in small increments — Realize success early — Adjust infrastructure — Predictable costs & duration • Adapt easier to new targets IMS Db2 z/OS Oracle Db2 LUW SQL Server VSAM In “the Cloud” Everything Does not have to be
  • 15. • Engine driven by a configuration script • Designed to function as a CDC utility • No source to target mapping • Fast / Scalable IMS CDC via Kafka Streaming 15
  • 16. 16 Streaming Performance Aspects • Throughput / latency primarily depends on speed of the target • Fortunately, streaming platforms are very fast targets Top items affecting performance / throughput: — Message Size — IMS CDC streaming → transaction size and arrival rate — Target replication scaling factor → affects acknowledgement response — IMS initial loads → overall data volume — Streaming platform configuration → must be tuned for source workload
  • 17. 17 Message Size • Small messages perform better than larger ones (a bit obvious) • It boils down to the number of bytes per message • IMS segments can be large (redefines, arrays, etc.) • Updates can contain before and after images So...What to Do? • Suggest Avro as the Target Data Format — A condensed version of JSON — JSON typically used for data validation only → can be easily read — Avro messages roughly the size of source segment (x 2 for updates) • Reduce the number of fields in the CDC message • Evaluation the requirement for publishing before images of updates
  • 18. Key Monitoring Metrics Operational • Component down unexpectedly • Latency → how far behind you are in publishing • Latency → when did you last hear from the engine(s) Informational • Overall throughput → records / bytes per second • Workload patterns / peak transaction arrival rate • Number of CDC records by database / segment • Number of transactions • zIIP offload statistics 18
  • 19. • Engine driven by a configuration script • Designed to function as a CDC utility • No source to target mapping • Fast / Scalable IMS CDC via Kafka Streaming 19
  • 20. 20 Best Practices • Keep it simple & efficient → minimize the number of moving parts • Avoid over scaling → start with basic configuration & work up • Keep full extracts / initial loads on standby — Need if ‘point of no return’ reached — Must be able to run against live databases • Discovery / planning — Data sources → CDC and bulk loads — Latency requirements — Data volume → CDC and bulk loads — Peak transaction arrival rate • Ensure proper monitoring / alerts are set • Goal → ‘set & forget’ deployment Technical approach.
  • 21. 21 Best Practices • Avoid data collection overkill • Help bridge the great divide • Approach with a comprehensive strategy • Involve the business unit(s) from the beginning • Be aware of application release cycle Business approach.
  • 22. The Connect CDC Difference • Multiple IMS capture options • Highly scalable capture and apply • Asynchronous capability for replication as well as real-time replication capabilities for IMS to cloud data warehouses and streaming frameworks such as Kafka • Supports IMS data sharing environments • No impact on IMS database 22