Main Point: As technology improves and data increases, there is a requirement be able to predict, search and optimize this new/additional data to gain insights from it that have not existed in the past.
Main Point: Analytics is now a key focus for our customers. As we have discussed, Operations Analytics can help increase business value by ensuring system and application availability and reducing Mean Time to Repair (MTTR).
Operations Analytics is about:
Predict - Proactively surfacing problems using anomaly detection. The current solution is IBM zAware. IBM zAware surfaces anomalies by analyzing z/OS and zLinux system logs. OMEGAMON and NetView integrate with IBM zAware by monitoring the IBM zAware anomaly scores, correlating log analysis with performance monitoring and providing the option to generate events and trigger automation.
Search - Search for information, including logs and metrics to enable a much more efficient environment for performing problem determination. The current solution in this area is IBM Operations Analytics for z Systems. IOA for z Systems integrates with ITM/OMEGAMON and Network Operations Insights.
Optimize – Provides analytics for both Business and IT. Capacity Management Analytics (CMA) for z/OS, is a suite that includes SPSS, Cognos and TDSz. CMA enables customers to forecast capacity and more recently provides a feature for forecasting the 4 hour rolling average enabling customers to manage subcap pricing.
Main Point: Search and analysis is the primary focus for Log Analytics and IBM Operations Analytics – Log Analysis provides this capability. This tool will enable you to perform problem determination and resolution more quickly and will ultimately decrease Mean Time To Recovery (MTTR).
The Log Analysis server runs on Linux on x Systems or Linux on z Systems. The server can consume logs from multiple sources (distributed and mainframe systems), enabling users to search and analyze log data from all components of your cross-platform workloads or from all the log sources in your enterprise if you so choose.
Customers are already seeing value from Analytics – One of the key values with IBM Operations Analytics is the ability to create Insight Packs designed to analyze specific logs.
The offering named IBM Operations Analytics for z Systems includes the Log Analysis server as well as z/OS Insight Packs that enable search and analysis for z/OS logs and performance metrics.
The initial release of the z/OS support was provided in March, 2014 under the product names ‘IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs – SYSLOG V1.1’ and ‘IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs - IBM WebSphere® Application Server V1.1’. Subsequent releases were named with the SmartCloud brand until April, 2015 when Version 2 of the product was rebranded to IBM Operations Analytics for z Systems V2.1.
IBM Operations Analytics for z Systems provides the following:
• Ability to collect z/OS logs across the enterprise and stream the logs to the Log Analysis server for the server to index and analyze.
• Ability to index, search, and analyze application, middleware, and infrastructure log data across System z enterprise.
• Ability to quickly search and visualize errors across huge volumes of log records.
• Advanced search and text analytics across large volumes of data.
• Expert advice by linking search results to available best practices and recommended resolution documentations.
• Near real-time streaming of z/OS logs.
The z/OS support consists of the following components:
• z/OS log forwarder that is installed on the required z/OS LPARs where the logs are to be collected and forwarded.
• SMF data provider that is installed on the required z/OS LPARs where SMF performance metrics are to be collected and forwarded.
• Insight Packs to provide the index, search, and domain insights capability for logs and performance metrics.
Search is provided for all messages in the logs and you can choose to search one or more or all logs. The user can also specify a timeframe of the search to help narrow the focus to the time period when the error occurred. The Insight Pack surfaces patterns as the logs are searched, enabling the user to quickly focus on errors and drill down to the offending problem area.
IBM Operations Analytics for z Systems provides out-of-the-box insights and application views for z/OS, WebSphere, DB2, CICS, IMS and MQ with the addition of Network Insights in V2.1. Also in V2.1, we have included initial support for consuming and analyzing performance metrics using our SMF Data Provider component.
The user interface is customizable such that users can build their own application views and create and save environment-specific queries. The search language is text based and easy to use, and users can easily create and save simple or complex search strings with minimal typing. The tool is helpful to novice as well as experienced users. Online help, product documentation and product videos are easily accessed from the Getting Started page.
5698-AAP V2.1.0 IBM Operations Analytics for z Systems
Large Insurance Company – Customer story 1
Quote: “This tool can really save a pile of diagnostic time! “
Customer experienced a problem that took 29 hours to debug. This process required time from both IBM (Level 2) and multiple employees from that company.
The account team contacted the IBM development team and described an outage at the customer site. The development team received the Syslogs from the customer, fed them into Operations Analytics Server and immediately saw the high volume of error messages on the two LPARs (thousands of error messages ... 900+ were Severe errors). Most errors were in DB2 and MQ. The development team immediately noticed the high volume of some very specific messages (mostly DB2). The Log Analysis Application views graphically displayed the message peeks (as compared to normal message flows). ‘Needles’ (error messages) in the haystacks (LPARs) were immediately evident through visual representation of the message spikes.
Ultimately, the problem was caused by a bad PTF that was applied as part of a z/OS maintenance window. The Expert Advice feature was used to pinpoint the relevant maintenance to fix the problem (based on the error messages that were generated). One member of the development team was able to pinpoint the problem using IBM Operations Analytics for z Systems in under 30 minutes … It went from 29 hours to 29 minutes.
Moral of the story - IBM Operations Analytics for z Systems would have helped decrease the amount of time required for problem determination.
The log analysis provided by IBM Operations Analytics for z Systems would have highlighted the high volume of error messages visually (in both the application views AND the insights (message pattern detection) to determine the scope of the problem (ie which systems are affected) and identify which additional components are affected (ie MQ, IMS, CICS, etc.). Once the focus was narrowed down to the problem area, the Expert Advice feature was used to perform a quick search of the IBM support site to identify a fix for the problem (PTF, technote, white paper, etc.).
Another Insurance Company – Customer story 2
Quote: “This tool can quickly prove it is not my fault!”
The DB2 support team within the customer shop often spends many hours isolating problems to discover it is not in fact a DB2 problem and needs to be routed to another group. In this specific case in point, there were serious MQ errors and the DB2 team spent hours isolating the problem as an MQ problem. With IBM Operations Analytics for z Systems, it was proven that the team could have gone directly to the source of the issue immediately. This would have saved them hours, and cumulatively days, of spinning unproductive cycles and they could have routed the issue to the internal MQ support team immediately.
Large Bank – Customer Story 3
Quote: “Faster than a speeding Bullet! “
Customer is running a WAS-based On-line Banking Application in a couple of datacenters. Often when they receive a trouble ticket from their external customer (i.e. the user of their online banking application), they cannot determine which datacenter originated the error messages. With IBM Operations Analytics for z Systems’ ability to consolidate logs, they stated they could reduce their initial isolation time significantly (maybe 50%)
Government Agency IT department - Customer story 4
Quote: “Talk about Time to Value! “
In a recent customer engagement, the client was able to download, install and configure the solution and had an operational environment in 2.5 hrs!
If you’re presenting to a customer that only cares about consuming mainframe data, then you should use this slide.
There is another slide in backup that provides a more complete picture because it includes data coming from OMNIbus and distributed systems as well as z/OS. Note that Syslogd falls under USS Log Files.
Distributed systems logs, insight packs, toolkits, etc. are documented here: https://www.ibm.com/developerworks/servicemanagement/ioa/log/downloads.html
Hadoop (frozen tier) and alerting is included in the 1Q, 2015 version of the IOA server.
Doesn’t need to be stuck w textual, can do visuals/graphs
Main Point: Analytics is now a key part of what customers are looking to improve on. As we have seen, analytics can help increase business value and IT metrics.
Analytics is about:
1. Predict problems and anomalies – Current product is OMEGAMON V5.1.1 with IBM zAware support and NetView which also includes IBM zAware
2. Search for information, including logs – The current product in this area is SmartCloud Analytics – Log Analysis
3. Optimize analytics for both Business and IT – Capacity Management Analytics (CMA) for z/OS, is a suite that includes SPSS, Cognos and TDSz.
IBM SmartCloud Analytics - Predictive Insights
Reduce outages and increase service performance with predictive problem detection
IBM® SmartCloud® Analytics – Predictive Insights can provide early problem detection to predict application or middleware problems before they impact service. The software helps you avoid application outages and increase service performance.
IBM SmartCloud Analytics – Predictive Insights helps you:
Avoid outages to increase application availability and reduce service degradation.
Perform faster root cause analysis to isolate problems sooner.
Reduce operational costs without the need for complex service models or specialized skills.
Personas supported:
Alice (Subject Matter Novice)
Jim (Subject Matter Expert)
Zach (Senior Systems Programmer)
Personas supported:
Alice (Subject Matter Novice)
Eric (Application Developer)
Jim (Subject Matter Expert)
Zach (Senior Systems Programmer)
Personas supported:
Alice (Subject Matter Novice)
Eric (Application Developer)
Jim (Subject Matter Expert)
Zach (Senior Systems Programmer)
Scenario: MQ environment spanning z/OS and Distributed systems.
MQ channel goes down.
MQ message is written to distributed system log.
IOAz triggers an event from the message in the distributed log
Event is sent to z/OS automation tool (ie NetView / SA)
Automation restarts the MQ channel.
Failure is resolved quickly, avoiding an actual problem.
Customer Scenario (prior to using IOAz)
MQ outage caused several hours of downtown and application failures. Multiple SMEs worked on the issue. MQ issues are often hard to debug.
Environment (with IOAz)
IOA server (running on System x or System z) receiving data from multiple sources
MQ server running on Windows server
Log File Agent (LFA) sending log data from Windows server into IOA server
NetView is running on z/OS and is driving Event and Message automation (Note that this could be ANY automation tool that can act as an Event receiver)
Scenario Overview (with IOAz)
MQ channel defined to z/OS system and MQ server on Windows stops abnormally. MQ server generates ‘channel down’ message (AMQ9999).
LFA sends AMQ9999 message to IBM Operations Analytics server
IBM Operation Analytics sends SNMP trap (or EIF event) to NetView
NetView issues command response to restart MQ channel
Customer Scenario (prior to using IOAz)
Customer applied z/OS and DB2 maintenance during weekend maintenance window. After the maintenance was applied, DB2 DDF applications started to fail due to ‘time-outs’. DBA was finally notified on Saturday evening, after several hours of failures. DB2 and TCP/IP level 2 teams tried to debug the problem. By Monday morning, all transactions were failing. DB2 and z/OS maintenance had to be backed out.
Environment (with IOAz)
IOA server (running on System x or System z) receiving data from multiple sources
DB2 is running on z/OS
z/OS Log Forwarder sending DB2MSTR address space log data into IOA server
NetView is running on z/OS and is driving Event and Message automation (Note that this could be ANY automation tool that can act as an Event receiver)
Scenario Overview (with IOAz)
DB2 errors written to DB2MSTR address space log after maintenance is applied
z/OS Log Forwarder sends messages from DB2MSTR address space log to IBM Operations Analytics server
IBM Operation Analytics receives DSNL511I, IXL043I and other DB2 failure messages and sends SNMP trap (or EIF event) to NetView
NetView issues commands to collect additional data and forwards the Event to the Event Management system so a trouble ticket can be created for the SME
I would like to introduce to you couple of solutions which demonstrate the use cases of IT Operations Analytics. Firstly, we will talk about Log Analysis Solution. If we take the example of a traditional incident lifecycle, we see that users report issues to service desk or monitoring tools generate events. Operations team (L1 support) assigns the incident to a resolver group. Subsequently the first resolver group engages other teams to drive incident troubleshooting and resolution. This is a time taking process as each of the teams perform troubleshooting in silos and do not have a unified view
Log Analysis Solution ingests system and sub-system logs from infrastructure and application components to provide unified time sequenced view of logs with the ability quickly search thru massive amount of data for specific issues. Log analysis enables the team to identify when and where the error happened. This drives swift engagement of the right resolver team/s in parallel. The key differentiator is reduction in time to isolate and resolve problems.
You need to install the following maintenance to enable the TEP launch-in-context to Operations Analytics for z Systems
Required changes to distributed components:
ITM TEPS: Provisional fix 6.3.0-TIV-ITM-FP0004-IV67740
Obtain FP5 fix by subscribing to: http://www.ibm.com/support/docview.wss?crawler=1&uid=swg1IV67740
Required changes to z/OS components:
PARMGEN: FMID HKCI310, Interim Feature APAR OA46184 (PTF UA76016)
Obtain fix: http://www.ibm.com/support/docview.wss?uid=swg1OA46184
ITM 630 z/OS TEMA update FMID HKDS630, APAR OA46976 (PTF UA76202, , available 2/28/15)
Obtain fix: http://www.ibm.com/support/docview.wss?uid=swg1OA46976
OMEGAMON XE for WebSphere MQ Monitoring: FMID HKMQ730, APAR OA46839 (PTF UA76091, available 2/28/15)
Obtain fix: http://www.ibm.com/support/docview.wss?uid=swg1OA46839
OMEGAMON XE for WebSphere Message Broker Monitoring: FMID HKQI730, APAR OA46840 (PTF UA76092, available 2/28/15)
Obtain fix: http://www.ibm.com/support/docview.wss?uid=swg1OA46840#more
OMEGAMON XE for Storage: FMID HKS3530 APAR OA46871
Subscribe and obtain fix: https://www.ibm.com/support/entdocview.wss?uid=swg1OA46871