O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 26 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation (20)

Anúncio

Mais de NRB (20)

Mais recentes (20)

Anúncio

NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation

  1. 1. © 2017 IBM Corporation Data: Spark and the Data Federation Leif Pedersen Executive IT Specialist, z Analytics, Europe Email: Leif.Pedersen@dk.ibm.com
  2. 2. © 2017 IBM Corporation Systems of InsightSystems of Record Systems of Engagement Look like a “déjà vu”? 2
  3. 3. © 2017 IBM Corporation In the new insight economy, winners infuse analytics everywhere to drive better outcomes! Create new business models (CEO) Attract, grow, retain customers (CMO) Transform financial & management processes (CFO) Manage risk (CRO) Prioritize IT investment for innovation (CIO, CDO) Optimize operations (COO) Fight fraud and counter threats (CSO) Systems of Insight Systems of Record Systems of Engagement 3
  4. 4. © 2017 IBM Corporation All Data New Dev StylesNew Analytics More People Business Value Embrace all data Run at the speed of business 1 Enable all analytics IBM Analytics Point of View - Make DATA SIMPLE and ACCESSIBLE to ALL DATA Professionals are leading THE Transformation! 2 3 4
  5. 5. © 2017 IBM Corporation The Evolution in the Approach to Getting Value from Data Operations Data Warehousing Self-service Analytics New Business Imperatives Maturity High High Low Data-Informed Decision Making • Full dataset analysis (no more sampling) • Extract value from non-relational data • 360o view of all enterprise data • Exploratory analysis and discovery Warehouse Modernization • Data lake • Data offload • ETL offload • Queryable archive and staging Lower the Cost of Storage Ensure resiliency and availability Business Transformation • Create new business models • Risk-aware decision making • Fight fraud and counter threats • Optimize operations • Attract, grow, retain customers Value We are here 5
  6. 6. © 2017 IBM Corporation SoE Analytics evolution to support all Analytics Apps on all Data – The Mainframe Use case 6 Applications Data SoI HDFSMap / Reduce Spark Historical data in DB2 for z/OS & IBM DB2 Analytics Accelerator Other Data BI Reporting Data Warehouse / Data Marts The Data Lake Evolution Operational Data stored in VSAM, IMS, DB2 SoR Core Business supported by CICS, IMS, WAS z/OSRules Score execution Machine Learning The Predictive Analytics EvolutionScore Creation IT Operational Data
  7. 7. © 2017 IBM Corporation z Systems Analytics Areas complement existing Analytics Environments. IBMDB2Analytics Accelerator In transaction rules and score execution Intraday capability for ad-hoc queries & predictive analytics Availability of historical data (in raw format) Accelerated reporting to fulfill internal and regulatory requirements Ability to transform data before offload to DWH or reporting Ability to create new models at any time Quasi Real Time availability of data for analytics Instant access to raw data for new report generation in hours instead of days Load and merge of ANY non DB2 z/OS data Scoring Rules A zDatazApps Scoring Rules Explore data to uncover hidden insights A 7
  8. 8. © 2017 IBM Corporation Opportunity to rethink business processes: analytics as an integral part of the process itself, rather than a separate activity performed after the fact o Transform business processes, not just provide existing styles of analytics faster and without latency Enable business leaders to perform, in the context of operational processes, advanced and sophisticated real-time analysis of their business data Hybrid transaction/analytical processing will empower application leaders to innovate via greater situation awareness and improved business agility. Gartner Research Note G00259033 28 January 2014: Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation The integration of transactions and analytics is an emerging and important market segment “ ” Analytics as part of the flow of business Insights on every transaction
  9. 9. © 2017 IBM Corporation Hybrid Transaction/Analytical Processing (HTAP) - with DB2 Analytics Accelerator OLAP DB2forz/OS Processing IBMDB2AnalyticsAccelerator DB2 for z/OS CPU savings target • Operational (in transaction) analytics • (complex) OLTP Accelerator focus • Ad-hoc queries • Complex queries scanning large amount of data • ETL acceleration/virtual transformation Complex queries (more history) OLTP Transactions High concurrency Hybrid Transactional & Analytical Processing Standard reports
  10. 10. © 2017 IBM Corporation Data Warehouse and Data Lake A Data Lake is… +An analytics sandbox for exploring data to gain insight +An enterprise-wide catalog to find data across the enterprise and to link from business term to technical metadata +An environment for enabling reuse data transformations and queries +An environment where users can access vast amounts raw data +An environment for developing and proving an analytics model and then moving into production; experience in production may drive further experimentation in the data lake A Data Lake is not… - A data warehouse or data mart of all of the data in an enterprise - A high-performance production environment - A production reporting application - A purpose-built system to solve a specific problem 10
  11. 11. © 2017 IBM Corporation Fast Runtime Environment – Interactive or batch processing – Based on data in-memory processing • High performance for multi-step processes where Spark can pass the data directly without using disk storage. – Parallel processing Interface to Data – Accessing Hadoop based HDFS data, Cassandra, Hbase, … – Accessing any traditional databases using JDBC Interface for Applications – Ease of Use APIs supported by modern languages – Stack of libraries including SQL, Machine Learning, GraphX, and Spark Streaming – Over 80 high-level operators that make it easy to build parallel applications – Many languages supported including Java, Scala, Python and R • Spark is actually written in Scala Spark, a Transaction Manager for Analytics Applications 11 Spark is NOT a datastore, NOT a replacement for Hadoop!
  12. 12. © 2017 IBM Corporation 2. Spark lets you develop line-of-business applications faster 3. Spark learns from data and delivers in real time With Hadoop, you ask a question and get back a batch of data. With Spark, you may say, “continue to give me answers to this question”…and when new data comes, the user is smarter. 1. Spark makes it easier to access and work with all data - Enables new data-based use cases - All data: Internal/ External, Structured/ Unstructured - Real-time insights, from all data sources - Automates analytics with Machine Learning - Clients that lead in data, lead their industry Design Develop ment Data Science Why Spark matters to a business? 12
  13. 13. © 2017 IBM Corporation VSAM z/OSKey Business Transaction & Batch Systems Spark Applications: IBM and Partners AdabasIMSDB2 z/OS Distributed Teradata HDFS Apache Spark Core Spark Stream Spark SQL MLib GraphX RDD DF RDD DF Optimized data access IBM z/OS Platform for Apache Spark and *many* more . . . Spark can run on z/OS close to z/OS-based Applications & Data Values: Data-in place analytics, without need to ETL or move data for analytic purposes Optimized access and z/OS governed ‘in-memory’ capabilities for core business data Unique capability to access almost all z/OS sources with Apache Spark SQL & many non-z data sources Almost all zIIP eligible Integration of analytics across core systems, social data, website information, etc. 13 and *many* more including SMF, OPERLOG, SYSLOGs, . . .
  14. 14. © 2017 IBM Corporation14 Examples of Spark Use Case
  15. 15. © 2017 IBM Corporation15 Client Insight Analytics over transactions & customer interactions Leverage data on z/OS (DB2, VSAM) & distributed (Oracle, SQL Server, HDFS) to enable real-time access from data science teams focused on client insight to develop patterns, models Data Distillation - Hybrid Architecture Run Spark z/OS to access, aggregate, filter and *distill* large volumes of data Make available smaller, aggregated analytic results for access by: customer insight solutions, data science environments 360 Degree View: Customers, Payments, Transactions Leverage Spark z/OS to get real-time or near real-time view of current status of payments, transactions, customers combining data from OLTP, distributed sources, & streaming IT Analytics Analyze real-time streamed SMF data, combined with archived SMF data and syslog data, visualize and interact with data science Jupyter Notebook to find patterns Use Case Patterns
  16. 16. © 2017 IBM Corporation16 Distill the Data: • Use Spark z/OS for data blending, cleansing, transform, etc with data- in-place • Store results in ‘Tidy’ Data Repository • Refresh as needed Explore the results Data exploration, investigation leveraging ‘Tidy’ Repository Values: • Leverage most current business data for data science • Efficiencies in reducing ETL • Leverage common analytics ecosystem skill • Integrate Spark on multiple platforms for optimal analytics infrastructure Use Case #1: Hybrid Data Science
  17. 17. © 2017 IBM Corporation17 Use Case #2: Optimized Customer Insight Customer z/OS Transactio nMerchant Spark Analytic Result Set Call Center Apache Spark Core Spark Stream Spark SQL MLib GraphX RDD DF RDD DF Optimized data Layer IBM z/OS Platform for Apache Spark Subset of Data: distilled, filtered, transformed BI Dashboard Components Data Cube Analytical Engines Web Portal Analytics API Gatewa y APIs Pre-Built Dashboards Pre-Built Data Models Pre-Built Analytical Models Transform (if needed), & populate BBCI staging area / cache Input & Output Tidy Data Values: • Avoid costly and ineffective wholesale copy of data • Frequent refresh of most relevant data elements to customer insights solution • Faster time to implementation for business solution to deliver insights on churn, cross- sell, etc. Customer Insight for Banking Solution
  18. 18. © 2017 IBM Corporation18 Use Case #3: Real-Time Application Event Analytics Use Case Spark z/OS Event Stream CICS Event triggers create an event stream that would be captured by Spark running on its own z/OS LPAR Spark configured for high availability to avoid impacting CICS Real-Time Analytics with Spark z/OS: Real time analytics to provide feedback into the Systems of Engagement or Monitoring Systems on types of banking services and frequency of consumption Real time monitoring of core business processes and applications Historical Analysis leverages IDAA: Batch Load of Events for historical, trending and reporting Real Time Analytics, can include scoring DB2 Analytics Accelerator Loader Channel System of Engagement CICS Transactions Monitor LogstreamLogstream IBM DB2 Analytics Accelerat or Real-Time Consumption Batch Load Overnight Historical Analysis, Reporting DB2 z/OS
  19. 19. © 2017 IBM Corporation19 Use Case #4: Surface Spark Results to JDBC / ODBC Applications DB2 z/OS z/OS Apache Spark Core Spark Strea m Spark SQL MLib Graph X DF RDD DF RDD DFStor • Persist specific Spark Result Sets • Backed by VSAM • Leverage z/OS SAF, Dataset mgmt HDFS JDBC / ODBC / REST, noSQL Client accessing Spark RDDs, example: Cognos , Tableau, … Optimized Data Layer IMSVSAM
  20. 20. © 2017 IBM Corporation20 Use Case #5: Analyzing SMF Data with Spark • Spark application is agnostic to data source and number of sources • MDSS required on at least one system, MDSS agents required on all systems. No IPL required for installation • Logstream recording mode required for realtime interfaces MDSS Client LPAR1 MDSS Client LPAR2 MDSS Client LPAR3 SMF Realtime Logstream Logstream Logstream SMF Realtime Logstream Logstream Logstream SMF Realtime Logstream Logstream Logstream Spark Application using SparkSQL Optimized Data Integration Layer (MDSS) JDBC LPARn SMF Realtime Logstream Logstream Logstream Dump Data Sets Analyze real-time in-memory SMF data, combined with archived data Analyze data across multiple LPARs Augment with SYSLOG and other sources for richer analytic outcome Efficiencies in avoiding data movement
  21. 21. © 2017 IBM Corporation21 Use Cases for Real Time SMF Analytics Detect excessive memory consumption – SMF30 Monitor high water mark for real memory usage for jobs and send alerts if usage exceeds normal consumption Detect security violations in real-time – SMF 80 Monitor volume of datasets/files accessed per user within a given time period and raise alerts for above normal access rates Real time monitoring resource usage in cloud environments (CPU, Memory, Disk) A list of supported SMF record types can be found in the Redbook “Apache Spark Implementation on IBM z/OS” - page 78 http://www.redbooks.ibm.com/abstracts/sg248325.html
  22. 22. © 2017 IBM Corporation22 IBM Open Data Analytics for z/OS
  23. 23. © 2017 IBM Corporation Business Applications CustomerTransactionMerchant Distributed Apache Spark Distilled Insight Query Acceleration Leveraging IBM Z for Optimized Analytics Federate analytics leveraging data in place for more current insights at scale, optimized security, privacy and reduced costs DataData Data Prep Data Prep ML Algo ML Algo ModelModel DeployDeploy PredictPredict Python Distilled InsightAnalytic Result Sets Govern, Manage, Algorithm Assist… Monitor, Feedback Pauselss GC New SIMD instructions 32 TB Memory Pervasive Encryption 23 IBM Open Data Analytics for z/OS IBM Machine Learning for z/OS Optimized Data Integration Layer
  24. 24. © 2017 IBM Corporation IBM Open Data Analytics for z/OS: Offering Overview What is in the Offering? IBM Open Data Analytics for z/OS (IBM product): • Apache Spark 2.1.1 enabled for z/OS • Python 3.6.1 • All Pre-requisite libraries • Select Anaconda Libraries (approx. 250 including pandas, dask, numpy, scikit-learn, matplotlib…) • Optimized Data Integration Layer: optimized for Spark & Python db access to z/OS data • Integration with WLM z/OS for resource management aligned with job priority • Integration with security (SAF) interfaces • Support & Service available from IBM for a fee –Very aggressive pricing for zIIPs (cores) and memory for Open Data Analytics z/OS workload Ecosystem –GitHub zos-spark repository •Jupyter Notebooks (Scala, Python Workbenches) •Kernel gateway, Jupyter client, kernel toree •Sample data & code snippets –Rocket: •Collaboration for Optimized Data Layer •Industry vertical mappings, e.g. ISO8583-1, ACH, SMF, etc. –Continuum: • Access to z/OS channel on Anaconda cloud for updates / refreshes & Package management • Option to license private mirrored environment • Services & Consulting for Python
  25. 25. © 2017 IBM Corporation Value: Increase Integration through Persisting Analytic Results for Enterprise Collaboration VSAM z/OS DF Store: • Specific Spark & Python Result Sets • Backed by VSAM • Leverage z/OS SAF, Dataset mgmtOptimized Data Layer Apache Spark Core Spark Stream DF DF MLib Graphx Spark SQL Python 3.6.1 Core Packages: • numpy • scikit-learn • dask • pandas • Matplotlib • Etc. IMS DB2 z/OS HDFS JDBC / ODBC / REST, noSQL Client accessing Spark RDDs, example: Cognos , Tableau, … IBM Open Data Analytics for z/OS
  26. 26. © 2017 IBM Corporation

×