The document discusses IBM's Big Data Platform for turning large and complex data into business insights. It provides an overview of key big data challenges faced by organizations and how the IBM platform addresses these challenges through solutions that handle the volume, velocity, variety and veracity of big data. These solutions include analytics, data warehousing, streaming analytics and Hadoop technologies. Use cases are presented for big data exploration, enhancing customer views, security intelligence, operations analysis and augmenting data warehouses.
1. IBM Big Data Platform
Turning big data into smarter decisions
…with Business Analytics.
Peter Jönsson
Sr Tech Sales Architect
IBM Business Analytics
{"16":"What is Operations Analysis? \nIt’s using big data technologies to enable a new generation of applications that analyze large volumes of multi-structured, often in-motion machine data and gain insight from it, which in turn improves business results\nWhat are the drivers for an Operations Analysis use case? \nIn its raw format, businesses are unable to leverage machine data\nGrowing at exponential rates\nComes in large volumes, variety of formats, often in-motion\nNeeds to be combined with existing enterprise data\nRequires complex analysis and correlation across different types of data sets\nRequires unique visualization capabilities based on data type and industry/application\nOrganizations want to leverage machine data to improve business results and decision-making\n","5":"Many organizations have figured out how to tap into their great natural resource – data. Here are 6 examples.\nA retailer reduced the time to run analytic queries by 80%. How did they do it? They moved from a general-purpose data warehouse to a purpose-built data warehouse appliance for deep analytics. They are running deep analytic queries on inventory levels and models which require heavy computations. \nA stock exchange company cut the time to run deep trading analytics from 26 hours to 2 minutes. They also moved from a general-purpose data warehouse to a purpose-built appliance. Again they were running deep analytic queries that required significant data access and computation. \nA telco cut the cost of hardware and storage by over 90% by moving to stream computing. By analyzing data as it streamed off the network, it was able to identify valuable data to be persisted and to persist only what is necessary. \nA government agency utilized stream computing to reduce the analysis of 250 TB of acoustic data from hours to 70 milliseconds. This resulted in significant cost savings, as well as the ability to react to potential threats quickly. \nA utility provider was able to predict and avoid power outages by analyzing up to 10 PB of data utilizing a combination of stream computing and a deep analytics data warehouse appliance. \nAnd a hospital was able to detect and intervene in potential life-threatening conditions up to 24 hours earlier, which makes a huge difference in the outcome of the patient. They did this by analyzing streaming data of various monitors and vitals indicators. \n","11":"Airbus is an InfoSphere Data Explorer customer with more than 4 TB of data indexed, including file systems, SharePoint, an SAP system and a Siebel CRM System . They have used this capability to provide unified information access enterprise-wide, and also for specific targeted business applications like their support “war room” where they handle calls from airline maintenance departments to help them get airplanes back in the air as quickly as possible. This one application accounted for more than $36 million in savings in a single year. \nTo find other customers like Airbus, look for the organizations whose business is information-intensive, such as those that build and support highly complex systems, or who have a compelling need to improve their operations through better use and re-use of information. It is critical for such organizations to equip their employees with access information across all of their silos, regardless of format or where it is managed. A direct line of questioning works best with these prospects, asking them to describe what steps their organization is taking to provide a unified view of information to employees; determine how they extract insights from their unstructured content; etc. Very few organizations are doing this successfully or efficiently.\nData Explorer is the typical starting point for this use case because it is the low-risk rapid path for an organization to get its arms around its data. This allows the organization to gain immediate business value through better information visibility, as well as prepare for the next stage of big data deployment.\n","28":"Key Points\n- Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology\nAnalyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters\nVisualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son\nDevelopment – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications\nWorkload optimization – improvements upon open source for efficient processing and storage\nSecurity and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world\n","17":"How do you know if Operations Analysis is right for your customer? \nDo you deal with large volumes of machine data (i.e. raw data generated by logs, sensors, smart meters, message queues, utility systems, facility systems, clickstream data, configuration files, database audit logs and tables)?\nAre you unable to perform the complex analysis, often in real time, needed to correlated across different data sets? \nAre you unable to search and access all of this machine data?\nAre you able to monitor data in real time and generate alerts? \nDo you lack the ability to visualize streaming data and react to it in real time? \nAre you unable to perform root cause analysis using that data? \nDo you want the ability to correlate KPI to events? \nKTH – Royal Institute of Technology\nClient Overview\nResearchers at KTH, Sweden’s leading technical university, gather real-time traffic data from a variety of sources such as GPS from large numbers of vehicles, radar sensors on motorways, congestion charging, weather, etc. \nBusiness Need\nThe integration and analysis of the data in order to better manage traffic is a difficult task. \nCollected data is now flowing into IBM InfoSphere Streams software—a unique software tool that analyzes large volumes of streaming, real-time data, both structured and unstructured. \nThe data is then used to help intelligently identify current conditions, and estimate how long it would take to travel from point to point in the city, offer advice on various travel alternatives, such as routes, and eventually help improve traffic in a metropolitan area. \nBenefits\nUses diverse data, including GPS locations, weather conditions, speeds and flows from sensors on motorways, incidents and road works\nEnters data into the InfoSphere Streams software, which can handle all types of data, both structured and unstructured\nHandles, in real time, the large traffic and traffic-related data streams to enable researchers to quickly analyze current traffic conditions and develop historical databases for monitoring and more efficient management of the system\nSolution Components\n• IBM® InfoSphere™ Streams\n• IBM BladeCenter® HS22\n• IBM BladeCenter H Chassis\n• IBM System Storage® DS3400\n• Red Hat Linux®\nCase study: http://public.dhe.ibm.com/common/ssi/ecm/en/blc03060usen/BLC03060USEN.PDF\nVideo case study : www.youtube.com/watch?v=qDQ8EH5HewM \n“Analyzing large volumes of streaming data in real time is leading to smarter, more efficient and environmentally friendly traffic in urban areas” -Haris N. Koutsopoulos, Head of Transportation and Logistics, Royal Institute of Technology, Stockholm, Sweden\n","6":"Script:\nIn fact, we have recognized the importance of Big Data for many years (it just wasn’t called Big Data back then) and have established solutions in the marketplace tailored to Big Data:\nVolume – Unlike statistical analysis where you primarily deal with a sample in order to understand the big picture, data mining has always been best utilized when it consumes all data available. From the inception of data mining as a discipline it has been a bottom’s up approach, and large volumes of have always been in play. IBM SPSS Modeler is a world class data mining platform has an unbroken legacy of high performance, and accuracy, against large volumes of data.\nVelocity – SPSS Modeler streams can be published into an InfoSphere Streams to then allow streaming data - to the tune of billions of events per hour – to receive scores in real time. This means transactions in real time can be determined as fraudulent or not instantly, or millions of customers tagged with cross and up sell offers in seconds. \nVariety – IBM Social Media Analytics (Cognos Consumer Insight), SPSS Modeler Premium’s text analytics workbench, and even SPSS Text Analytics for Surveys allows for a wide variety of data to be included in the analysis (predictive or otherwise). Key today is the ability to understand and integrate the object, subject and sentiment of social media content. IBM SMA and Modeler Premium are both geared to do this for the ability to gain greater insight (SMA) and integrate such insights into predictive modeling (Modeler Premium).\nVeracity – SPSS Modeler Premium has Entity Analytics capability. This enables the resolution of multiple entities, such as customer records, where some are duplicates and some are deliberative obfuscations of truth for the purposes of committing fraud. This capability also extends to resolution of many “”differing” records of assets, which is critical when that asset is something like an engine on a passenger plane getting checkups at myriad airports around the world. \n","12":"Gaining a full understanding of customer—what makes them tick, why they buy, how they prefer to shop, why they switch, what they’ll buy next, what factors lead them to recommend a company to others—is strategic for virtually every company. IBM’s own Institute for Business Value report “Real-world use of big data” cites as its #1 recommendation that organizations should focus their big data efforts first on customer analytics that eable them “to truly understand customer needs and anticipate future behaviors.” \nIn addition to these analytics that give strategic insights into customer behavior, the importance of the 360°view extends to the front-line employees. Forward-thinking organizations recognize the need to equip their customer-facing professionals with the right information to engage customers, develop trusted relationships, and achieve positive outcomes such as solving customer problems and up-selling and cross-selling products. To do this they need to be able to navigate large amounts of information quickly to zero in on what’s needed for a particular customer. \nAs you’ll see in a moment, this is very synergistic with IBM’s data governance story, especially MDM. \n","29":"How does IBM Business Analytics enable data integration? \nBusiness Analytics understands that customers have data in physically different locations and, often, in a variety of data technologies. IBM Business Analytics products connect to a broad spectrum of data sources, from real-time streaming sources to Hadoop to relational data to flat files. The list is comprehensive.\nNew in RP1, Business Intelligence supports Apache Hive with optimized access with BigInsights Big SQL. The Hive support extends reach to Apache Hadoop, Cloudera, Hortonworks, AWS EMR and other distributions using Hive 0.8 or Hive 0.9.\nAs of May 2013, SPSS has a new analytics engine for Hadoop in Beta, called SPSS Advanced Analytics\nNew in RP1 is SAP HANA Certification, rounding out the HANA support for BA. SPSS announced certification in Q4 2012.\nUnlike many of our competitors, IBM Analytics has a proven track record combining data from different sources… unique to IBM, is the ability to process and combine both data in motion and at rest\nNote: For Data in-motion, SPSS Modeler is integrated with InfoSphere Streams to enable analytic processing while data is still in-motion. Models can be designed in SPSS’ GUI and then imported into Streams to process at real-time. Cognos Real-time Monitoring or RTM is used to actively monitor the streaming data via JMS. Cognos RTM dashboards can be integrated directly within Cognos Workspace and its in-memory views and cubes used as a data source by Cognos Framework Manager. In addition, either InfoSphere Streams or Cognos RTM can write this streaming data to a database table for further processing.\n","18":"What is data warehouse augmentation and what are the drivers? \nData Warehouse augmentation builds on an existing data warehouse infrastructure, leveraging big data technologies to ‘augment’ its value\nTwo main drivers: \nNeed to leverage variety of data\nStructured, unstructured, and streaming\nLow latency requirements (hours not weeks or months)\nRequires query access to data\nOptimize warehouse infrastructure\nWarehouse data volumes reaching big data levels\nLarge portion of data in warehouse not accessed frequently\nNeed to optimize warehouse investment (Note: this is not to imply that our warehousing solutions are expensive, but instead that augmenting with big data technologies can make the warehouse a more optimal investment since you no longer attempt to store and analyze EVERYTHING within the warehouse, which can strain it from a performance and cost perspective. \n","7":"Big data comes from many sources. Its much more than traditional data sources. And it order to capitalize on the breakthrough opportunities we’ve discussed, you definitely need to look beyond traditional data sources. But at the same time, don’t forget that big data comes from those traditional sources too. Transactional data and application data is growing an a significant rate. Although it’s structured, that data is large and it is contained in many different structures. \nBig data includes machine data – logs, web logs, instrumentation data, network data. Data generated by machines is multiplying quickly, and it contains valuable insights that need to be discovered.\nSocial data also needs to be incorporated. Most social data is really textual data. And the valuable insights remain locked within that text and its many possible meanings. And most of that data isn’t valuable, or has a very short expiry date during which it is valuable. That makes social data very challenging – extracting insight from largely textual content in very little time. \nAnd enterprise content must be amalgamated as well. And that data comes in many forms, and also in significant volume. \n","13":"Information about a client as viewed in an application built with the Data Explorer Application Builder. \nThe Data Explorer app combines information in context from CRM, content management, supply chain, order tracking database, e-mail and many more systems to give a 360º view of the client so the user doesn’t have to log into and search multiple different systems. In this one view the customer-facing professional can see all of the contact’s information -- what products she’s, any recent support incidents, news about her company, recent conversations and more. An “activity feed” in the center of the screen shows up-to-the-moment updates about the customer, product or other entity that is being viewed. Analytics from BigInsights, Streams and IBM’s BI products can also be shown, with the context of the analytics defined by the information displayed in the application. This frees the CFP to interact with the customer and leverage this complete view to increase revenue and improve customer loyalty.\nAs I mentioned a moment ago, this use case is very synergistic with IBM’s MDM offerings. MDM provides a single, consistent view of data across all of the client’s various systems. This consistency ensures that the view created by Data Explorer will incorporate consistent and accurate data about an entity. In one sense, Data Explorer provides a business user interface to trusted master data combined with related content from other structured and unstructured data sources. The availability of MDM accelerates implementation of the Data Explorer 360º application and ensures its accuracy and consistency.\n","2":"Main point: Data is growing at an astounding rate. It is growing so fast that we often lack the ability to use it to its full potential. The highly unstructured nature of this data makes the challenge that much more difficult. This is a real problem for business. It makes informed decisions more difficult to make. Business leaders need a way to find hidden patterns and isolate the valuable nuggets that they need to make business decisions.\nFurther speaking points: Yet, the rewards for finding a way to harness the data into useful information are great; 54% of companies in this year’s study with MIT/Sloan are using analytics for competitive advantage… and that number has surged 57% in just the past 12 months. “Dying of thirst in an ocean of data”… It’s an apt analogy. Data is everywhere. 90% of it didn’t exist just two years ago. The vast majority of it is totally useless for any given goal and therefore amounts to noise and a hindrance to finding the key useful information needed in a specific time and place. \nAdditional information: See information and stats\n","30":"How do you get started?\nAttend these forums and deep dive sessions on big data.\njoin Big Data University - Learn Hadoop and other Big Data technologies Many courses are FREE! Acquire valuable skills and get updated about industry latest trends. Learn from the Experts! Big Data University offers education about Hadoop and other technologies! \nRead analyst reports and papers – including Forrester’s Big Data wave on Enterprise Hadoop solutions.\nConnect with your sales contact and schedule a big data workshop free of charge to discuss best practices and business value for your organization.\nAttend a Developer Day to get hands-on experience with big data technology – contact your local sales rep for dates/locations.\n","19":"Background: \nGlobal EDW Project – to build GM’s first true Global EDW to provide a completely unified view of the business across stakeholders \n$1,500,000 \nBigInsights \nCompetition – Teradata/Aster, EMC/Greenplum\nThe key to winning this deal was focusing on the big data ecosystem and how BigInsights complements and completes a data warehouse environment. None of our competitors can offer this complete view. How did we win the deal? We took what could have been a traditional RDBMS deal for GM, which we may have lost to Teradata, and shaped the direction of the POC to focus on big data and include Hadoop/BigInsights. We highlighted our competitive differentiators: \n· IBM’s commitment to open-source Hadoop with BigInsights \no IBM’s 5 year history of developing on this platformo Progression of how we have simplified deployment and managemento Provide future direction (Appliance, SQL compliance, Federation)o Proven integration (Information Server and PureData for Operational Analytics)\n· Up and running quickly \no Download evaluation version at GM lab and walk them through installation and upgrade processo Install all Blue technology stack into GM lab\n· Unmatched full power of IBM’s resources \nPoughkeepsie Labo Expertiseo Experience\nHow do you know if a data warehouse augmentation is right for your customer? \nAre you drowning in very large data sets (TBs to PBs)?\nDo you use your warehouse environment as a repository for ALL data?\nDo you have a lot of cold, or low-touch, data? \nDo you have to throw data away because you’re unable to store or process it? \nDo you want to be able to perform analysis of data in-motion to determine, in real-time, what data should be stored in the warehouse? \nDo you want to be able to perform data exploration on complex and large amounts of data? \nDo you want to be able to do analysis of non-operational data? \nAre you interested in using your data for traditional and new types of analytics?\nExample of large automotive company that used BigInsights for data warehouse augmentation (primarily as pre—processing hub, but also ad hoc analysis)\n","14":"(BestBuy)\nWhat were we doing ?\nInvestigate whether sales forecasting can be improved by adding signals from unstructured social media to structured historical sales data\nHow did we go about proving it?\nForecast sales of cameras and video games using client-provided structured data (sales, stores, prices)\nIngested 2 years of sales data for 5K camera and video game products in 25 stores across 3 US States\nApplied an innovative three-step Demand Sensing method for modeling aggregated demand data with a large number of multi-product and multi-factorial effects in a single, comprehensive framework\nAdded social media buzz and sentiment to see if the forecasts changed and improved\nExtracted social media mentions of client-mentioned related cameras and video games using 5 months of Twitter data\nAnalyzed feedback signals including buzz, purchase intent, sentiment, and ownership for products, families, consoles, brands, and categories\nWhat did we prove?\nPOT results indicated that buzz around products in social media improves sales forecasting (see chart at right)\nOne observed phenomenon is that new product announcements and rumors create buzz which can be detected before sales spike\nEntity Integration technology combined with Demand Sensing technology allows us to analyze the effect of social media mentions at multiple hierarchical levels (products, families, consoles, brands, and categories)\n","15":"What is Security/Intelligence Extension? \nIt’s using big data technologies to augment and enhance traditional security solutions by analyzing new types (unstructured, streaming) and sources of under-leveraged data to significantly improve intelligence, security, and law enforcement insight.\nWhat are the drivers for an Security/Intelligence Extension use case? \nNeed to analyze from existing & new sources (data in motion and at rest) to find patterns & associations\nNeed for more up-to-date intelligence information (currency)\nAbility to predict, detect, act to network/computer security threats sooner\nInability to analyze telco and/or social data to track criminal/terrorist activity\nNeed to analyze data that is continuously generated such as video, audio, smart devices\nOrganizations want to enhance their existing security/intel platforms to improve local & national security, protect their borders, and prevent criminal/terrorist activity\n","10":"Big data exploration addresses the challenge that every large organization faces: information is stored in may different systems and silos. As the world has changed we have seen the addition of many other sources of data that people need to do their day-to-day work and make important decisions,\nEarlier we spoke of data as “the new oil”—a resource that offers tremendous potential but requires exploration and refinement to extract that value. Just as the oil and gas industry relies on exploration to identify the most productive resources for drilling, the first step in leveraging big data is to find out what you have and to establish the ability to access it and use it to support decision-making and day-to-day operations. Big Data Exploration is the way you get started. \nMost discussions of big data start by the three Vs—volume, velocity and variety. These identify the dimensions of the challenge that every large organization deals with on a daily basis as they struggle to extract value from their information resources, to make better decisions, improve operations and reduce risk. Every large organization has multiple applications that manage information, from their CRM systems to ECM, data warehouses and rapidly growing corporate intranets. The challenge is that any important decision, customer interaction or analysis inevitably requires information from multiple different sources. With the inclusion of Data Explorer, IBM’s Big Data platform provides the capability to easily navigate information in all of these enterprise systems as well as data from outside the organization. \nThe growth of so-called “raw” data—collected from sensors, machine logs, clickstreams from websites, etc.—presents yet another challenge. How do organizations add context to this data to fuel better analytics and decision-making? Here again, the ability of Data Explorer and other products in the Big Data platform to fuse information from these raw, semi-structured sources with enterprise data can add valuable context that will help organizations gain value from this data.\nYet another way that IBM’s big data exploration capability adds value is in the area of risk containment. Organizations that lack the ability to navigate and explore large areas of their information landscape put themselves at risk of leaking confidential information, leaking important trade secrets and strategic information to competitors, and inability to retrieve and verify information when required for litigation and other corporate governance matters.\n"}