Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
IBM Watson and Analytics:
Welcome to the Cognitive Era
A new era in technology, a new era in business.
Jerry Carroll, Executive Architect – Big Data & Analytics
jbcarrol@us.ibm.com

Where code goes,
where data flows,
cognition will follow.
2
Watson is ushering in a
new era of computing
Tabulating
Systems
Programmable
Era
Cognitive Era
1900
1950
2011

3
Here’s what the world is saying about the impact of
Cognitive systems on the future of how we work:
“IBM Crafts a Role for Artificial
Intelligence in Medicine.”
“IBM Watson represents a bold
technological and visionary step a future
in which every aspect of our lives will be
enhanced by the utility of cognitive
computing as it is harnessed into myriad
new applications.”
“What is distinctive about IBM is the
breadth of its effort to create Watson
tools and services as plug-in offerings for
a wide range of developers.”
“At Wayin, former Sun CEO Scott McNealy is
using Watson's image recognition capabilities
to trawl photos on social media and make
them searchable, even when they don't have
tags describing their content. ‘You can't do this
without Watson,’ he said.”
“IDC predicts the worldwide cognitive
software platforms market will grow to $3.7
billion in 2019, at a CAGR of 35% over 5
years.”
IDC: Worldwide Cognitive Software Platforms
Forecast, 2015-2019: The Emergence of a
New Market (#258781, September 2015,
David Schubmehl)
“IBM is the only company marketing a
cognitive computing platform that’s
specifically designed to support the
development of a broad range of
enterprise solutions.”
“No doubt, Watson has the means to radically
change the industry. In fact, its potential as an
‘innovation lake/incubator’ should be highly
valued.”
IDC: IBM’s Go-to-Market Transformation –
Deeper, Wider, Newer (#AP257527, April 2015,
Chris Zhang, Sabharinath Balasubramanian,
Mayur Sahni)
“IBM’s famous cognitive computer can
help banks with complex financial
operations and attack important health
care problems. Now you can add seeing
to its skill set.”
“These days, it’s not just AI algorithms
themselves that have improved, but the
ability to deliver them…that has made so
many new applications possible.”

There are three capabilities that
differentiate cognitive systems from
traditional programmed computing
systems.
Reasoning
They reason. They can understand
information but also the underlying ideas and
concepts. This reasoning ability can become
more advanced over time. It’s the difference
between the reasoning strategies we used
as children to solve mathematical problems,
and then the strategies we developed when
we got into advanced math like geometry,
algebra and calculus.
Learning
They never stop learning. As a technology,
this means the system actually gets more
valuable with time. They develop
“expertise”. Think about what it means to
be an expert- - it’s not about executing a
mathematical model. We don’t consider
our doctors to be experts in their fields
because they answer every question
correctly. We expect them to be able to
reason and be transparent about their
reasoning, and expose the rationale for
why they came to a conclusion.
Understanding
Cognitive systems understand like
humans do, whether that’s through
natural language or the written word;
vocal or visual.

Data is
transforming
industries and
professions.
5
HOW, AND WHY NOW?

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 6
CONSIDER:
Data flows from every device,
replacing guessing and
approximations with precise
information. Yet 80% of this
data is unstructured; therefore,
invisible to computers and of
limited use to business.
HEALTHCARE DATA GOVERNMENT & EDUCATION DATA
99% 88% 94% 84%
Healthcare data comes from
sources such as:
Government & education data
comes from sources such as:
Patient
Sensors
Electronic
Medical
Records
Test
Results
Vehicle Fleet
Sensors
Traffic
Sensors
Student
Evaluations
UTILITIES DATA MEDIA DATA
93% 84% 97% 82%
Utilties data comes from sources
such as:
Media data comes from
sources such as:
Utility
Sensors
Employee
Sensors
Location
Data
Video
and Film
Images Audio
By 2020,
of new information
will be created every
minute for every
human being on
the planet.
growth by 2017 unstructured growth by 2017 unstructured
1.7 MB growth by 2017 unstructured growth by 2017 unstructured

The world is
being reinvented
in code.
7
HOW, AND WHY NOW?

CONSIDER:
The world is being rewritten in
software code, and cloud is the
platform on which the new digital
builders—from developers to
business professionals—are
reimagining everything from
banking to retail to healthcare.
Smart TVs represented 27% of
all TV sales in 2012; by 2018,
they will represent 82%.
Smart LED lighting will grow
from 6M units in 2015 to 570M
units in 2020, used for safety
communication, health, pollution
and personalized services.
By 2017, there will be 1B
connected things in smart
homes, including appliances,
smoke detectors and cameras.
100,000,000
lines of code in a new car
5,000,000
lines of code in smart appliances
1,200,000
lines of code in a smartphone
80,000
lines of code in a pacemaker
of B2B
collaboration
will take place
through web
APIs next year.
50%
Sensors for industrial
asset monitoring and
management will grow from
just over 15M units in 2014 to
over 40M units in 2018
Smart traffic sensors and
other devices installed in smart
cities will grow from 237M units
in 2015 to 371M in 2017.
Revenues for
smart grid sensors
will grow ten-fold from
2014 to 2021.
By 2020, there will be
925M smart meters installed
worldwide, more than double
the 400M in 2014.
Code Tools
Analytics Data APIs

Computing is
entering a new
cognitive era.
9
HOW, AND WHY NOW?

CONSIDER:
Cognitive systems can understand the
world through sensing and interaction,
reason using hypotheses and arguments
and learn from experts and through data.
Watson is the most advanced such system.
Today, businesses in
countries across.
There are
Watson ecosystem
partner companies,
with
78%
of business and IT
executives believe
that successful business
will manage employees
alongside intelligent
machines.
On average there are
Among C-Suite executives
familiar with cognitive computing:
96%
84%
94%
89%
in insurance intend to invest in
cognitive capabilities.
in healthcare believe it will play a
disruptive role in the industry, and
60% believe they lack the skilled
professionals and technical
experience to achieve it.
in retail intend to invest in
cognitive capabilities.
in telecommunications believe
it will have a critical impact on the
future of their business.
36
17industries are
applying cognitive
technologies.
350+
100
of those have taken their
product to market.
1.3B
Watson API calls a month
and growing.

• Deeper human engagement
• Elevated expertise
• Cognitive products and services
• Cognitive processes and operations
• Intelligent exploration and discovery
ADVANTAGES OF COGNITIVE BUSINESS:
11

Relationship
Extraction
Questions
&
Answers
Language
Detection
Personality
Insights
Keyword
Extraction
Image Link
Extraction
Feed
Detection
Visual
Recognition
Concept
Expansion
Concept
Insights
Dialog
Sentimen
t Analysis
Text to
Speech
Tradeoff
Analytics
Natural
Language
Classifier
Author
Extraction
Speech to
Text
Retrieve
&
Rank
Watson
News
Language
Translation
Entity
Extraction
Tone
Analyzer
Concept
Tagging
Taxonomy
Text
Extraction
Message
Resonance
Image
Tagging
Face
Detection
Answer
Generation
Usage
Insights
Fusion
Q&A
Video
Augmentation
Decision
Optimization
Knowledge
Graph
Risk
Stratification
Policy
Identification
Emotion
Analysis
Decision
Support
Criteria
Classification
Knowledge
Canvas
Easy
Adaptation
Knowledge
Studio
Service
Statistical
Dialog
Q&A
Qualification
Factoid
Pipeline
Case
Evaluation
12
IBM WATSON
The Waston that competed on
Jeopardy! in 2011 comprised what
is now a single API—Q&A—built
on five underlying technologies.
Since then, Watson has grown to
a family of 28 APIs.
By the end of 2016, there will
be nearly 50 Watson APIs—
with more added every year.
Natural
Language
Processing
Machine
Learning
Question
Analysis
Feature
Engineering
Ontology
Analysis

IBM WATSON
Personality
Insights

IBM WATSON
These APIs are underpinned by
50 technologies:
Anaphoric Co-referencing
Colloquialism Processing
Content Management -- Versioning
Convolutional Neural Networks
Curation
Deep Learning
Dialog Framing
Ellipses
Embedded Table Processing
Ensembles and Fusion
Entity Resolution
Factoid Answering
Feature Engineering
Feature Normalization
Focus and Spurious Phrase
Resolution
HTML Page Analysis
Image Management
Information Retrieval
Knowledge (Property) Graphs
Knowledge Answering
Knowledge Extraction Annotators
Knowledge Validation and
Extrapolation
Language Modeling
Latent Semantic Analysis
Learn To Rank
Linguistic Analysis
Logical Reasoning Analysis
Logistical Regression
Machine Learning
Multi-Dimensional Clustering
Multilingual training
n-Gram Analysis (word
combinations and distance)
Ontology Analysis
Pareto Analysis
Passage Answering
PDF Conversion
Phoneme Aggregation
Question Analysis
Question-answering Reasoning
Strategies
Recursive Neural Networks
Rules Processing
Scalable Search
Similarity Analytics
Statistical Language Parsing
Support Vector Machines
Syllable Analysis
Table Answering
Visual Analysis
Visual Rendering
Voice Synthesis

BECOMING A COGNITIVE BUSINESS
1. A cognitive strategy
Determine what data you need, which experts will train the system;
where you must build more human engagement; which products,
services, processes and operations should be infused with
cognition, and which parts of the unstructured 80% of data you
most need to focus on to make discoveries for the future.
2. A foundation of data and analytics
Collect and curate the right data—data you own, data from
others, data available to all; both structured and unstructured.
Apply cognitive technologies to this data in order to sense,
learn and adapt, thereby creating competitive advantage.
3. Cloud services optimized for
industry, data and cognitive APIs
The building blocks for products and services are code, APIs and
diverse data sets. The platform you choose to develop on, and
the agile development culture and methods you embrace, will be
critical to your success.
4. IT infrastructure tuned for
cognitive workloads
Architect a new kind of IT core—a heterogeneous
infrastructure that serves as the backbone of your enterprise.
Do this rapidly and affordably by harmonizing technologies
from public, private and hybrid cloud with distributed devices,
IoT instrumentation and your existing systems.
5. Security for a Cognitive Era
As cognition makes its way into cars, buildings, roadways,
business processes, fleets, supply chains—securing
every transaction, piece of data, and interaction becomes
essential to ensure trust in the entire system—and in your
brand and reputation.

Power Systems – Designed for Big Data and Analytics solutions
16
Analytics 1.0

17
Analytics 1.5

18
Analytics 2.0 - support a variety of data and a range of analytics

Classic hadoop infrastructures can be inefficient and inflexible
leading to server and cluster sprawl, unnecessary software licenses,
and infrastructure management challenges
IBM Data Engine for Hadoop and Spark
IBM Data Engine for Analytics

Avoid Server Sprawl as Big Data and Analytics environments grow
 Intel server growth estimated as a sample
configuration with 500 TB of user space grows 5x and 10x
to 2.5 PB and 5 PB:
5x 10x
 POWER8 server growth using IBM Data Engine
for Analytics estimated as a sample configuration
grows 5x and 10x to 2.5 PB and 5 PB of user
space:
5x 10x
 Sizing is based on assumptions regarding general configurations and use cases for BigInsights and BigSQL with actual client. The comparison reflects
the number of servers required to deliver relative performance and equivalent user space on Intel reference architecture using Hadoop triple
replication with 4 TB drives versus POWER8 IDEA reference architecture with 4 TB drives using the Elastic Storage Server.
17 Servers
28 Servers
75 Servers
143 Servers

Many new solution workloads
in addition to existing apps
Leads to costly, complex, siloed, under-utilized infrastructure
and replicated data
Development Test Distributed ETL,
Sensitivity
Analysis
Hadoop based
Sentiment
Analysis
Low
Utilization
= Higher cost
Low
Utilization
= Higher cost
Infrastructure Silos aka Cluster Sprawl is Inefficient

IBM Data Engine for Analytics (IDEA) – Client Example
Client: Multinational Telecomm. Company
A multinational telecommunication company with
over 6M subscribers..
Challenges
Expectations of a Real Time Marketing (RTM)
based solution to run event-based campaigns
Enable event-based marketing, analyzing various
sources of input data containing information
regarding subscribers actions
Dispatch the triggered events to downstream
applications such as campaign management, for
associated campaign execution.
Architecture- Solution Components
• BigInsights, Streams, SPSS Modeller, SPSS
Analytics Server
• IBM Data Engine for Analytics: 20 X S822L, 2 X ESS
GL4, Spectrum Scale, PCM
Solution Approach
Solution provided a Hadoop-based Big Data platform,
integrated to the RTM decision engine, to enable data
monetization opportunities, including location based
analytics
Customer was not comfortable with the huge number of
x86 Data Nodes approach of typical Hadoop
Architecture
The IBM team designed the Power solution and
conducted a technical workshop on newly redefined
Hadoop architecture based on IDEA.
Key Client Benefits
Optimized Big Data deployment architecture with IDEA
Architecture with Linux on Power, Elastic Storage Server
and Spectrum Scale
Lower TCO with 4 Racks on Power vs 12 racks on x86
More IO bandwidth with 40GbE Power network against
10GbE on x86 based solution
3x less racks for 2 PB
Big Data solution
3x less racks for 2 PB
Big Data solution
4 vs. 12

IBM Data Engine for Analytics – Solution Highlights
Actionable Insights with IBM BigInsights preloaded
+ Increase business value by consolidating multiple
analytic capabilities and data as needed
Up to 2.5x* faster insights
Smart Infrastructure Services with IBM Platform Computing
+ Designed to handle multiple analytic workloads in a multi-tenant
environment with dynamic resources
Designed for Data with IBM POWER8 Systems (S822L)
+ Outstanding memory and IO bandwidth design for the demands of Big Data
2x* better performance
Scalable Networking with IBM and partner networks
+ High bandwidth, low latency networking Ethernet
RoCE, (10 Gbit or 40 Gbit), InfiniBand RDMA (FDR)
10x to 100x network performance growth since Hadoop inception
Flexible Storage with IBM Elastic Storage Server
+ Combines Servers, Storage Enclosures, Disks and Elastic Storage Software
Over 2x** reduction in storage disk count
*Based on internal testing and cost analysis
**Based on client example vs a triple replica Hadoop configuration.
Big Data & Analytics
Software
Infrastructure Services
POWER8 Servers
Scalable Networking
Scale Out Cluster File System
Elastic Storage Server
Appliance-Like but much more Versatile!

Compute Plane = Power8 Systems, Designed for Big Data
4X
Threads per core*
4X
Memory Bandwidth*
5X
More cache*
multaneous Multi-Threading
On-Line Transaction Processing
gh Performance Computing
These design decisions result in best
performance for all types of workloads
such as: Java, OLTP, Analytics, Big Data, HPC
* POWER8 compared to Intel Haswell EX
Sources: Haswell EX:
http://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz
POWER8:
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BR&infotype=PM&appname=STGE_PO_PO_USEN&
POWER8
SMT8
x86
SMT2
POWER8
pipe
Data flow
x86 pipe
POWER8
x86
x86POWER8
1.4 – 2.3X
Clock Frequency
Components of IDEA

Data Plane = Elastic Storage Server, A Data Lake for many
applications
Cinder SwiftGPFS NFS
Linear capacity & performance
scale out
POSIX
Enterprise storage on
standard hardware
Technical Computing Big Data & Analytics Cloud
File
77.5 Percent of
organizations are
already investing in a
Data Lake*
Elastic Storage Server
Single Name Space
25
Hadoop
Block Object
Data Lake
Structured Data
Unstructured Data
Traditional
Analytics
BigSQL
Components of IDEA

Production - IBM Data Engine for Analytics
26
Without HMC/TFT/SMN/MN With HMC/TFT/SMN
Available Filesystem Capacity 0PB 1PB
Number of Data Nodes 14 4
Hadoop Mgmt LPARs 0 6
System Mgmt Node 0 1
Data Node – S822L
24x 3.02GHz cores
256GB DRAM
12x 1.8TB 10K SAS HDD
2x40GbE (2 port)(data+mgmt)
2x 4-port 1GbE NIC (mgmt)
Hadoop Management Node – S822L
2 LPARs with split backplane
256GB DRAM
8x 1.8TB 10K SAS HDD (OS + data)
System Management Node – S812L
32GB DRAM
2x 300GB 10K SAS HDD (OS)
Initial RackScale out Racks

Single vendor support
Up to 2x better price performance
for Spark workloads*
Delivered as a fully integrated
cluster ready to run
OpenPOWER innovation with
IBM S812LC servers
 Optimized configurations for
Hadoop or Spark workloads
 Based on S812LC servers with
up to 14*6TB disk drives per
server
 Optionally preloaded with
IBM BigInsights and IBM Open
Platform
 Simplify operations – easy to
deploy and manage
 Adapt and scale to your
changing analytics needs
IBM Data Engine for Hadoop and Spark
OpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high
performance, storage dense and fully integrated cluster offering.
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL
RDD Relation, Logistic Regression, SVM
Announce: Feb 9, 2016
GA: Mar 18, 2016

IBM Data Engine for Spark and Hadoop (IDE-HS)
Cluster Performance
Designed for the Cognitive Era to Make Better Decisions even Faster
IBM Data Engine for Hadoop and Spark infrastructure
delivers Spark workload scaling to minimize execution
times and reduce batch windows
-2.1X more performance per dollar spent for Spark
Logistic Regression based Machine Learning used in
model training by wide variety of lines of business
-1.4X more performance per dollar spent for Support
Vector Machine (SVM) – a Machine Learning algorithm
used in product Recommender Systems
-1.7X more performance per dollar spent for Spark SQL
query processing used widely in Big Data clusters
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL RDD Relation, Logistic Regression, SVM
• 6 Data Nodes and 1 Management Node. Each node is IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.92GHz, 256 GB
memory, RedHat 7.2, Spark 1.5.1, OpenJDK 1.8
• 6 Data Nodes and 1 Management Node. Each node is x86 E5-2620V3 12 cores / 24 threads, E5-2620 V3; 2.4GHz, 256 GB memory,
RedHat 7.1, Spark 1.5.1, OpenJDK 1.8
• Pricing is based on web prices of HP DL380 and list prices of IBM Power S812LC
SVMLogres SQL
SVMLogres SQL
6

• Apache Spark is an open-source in-memory distributed compute engine
– It speeds iterative analysis on large-scale data up to 100x faster
than current technologies
– Enables more people to collaborate together to access data,
apply analytics and deploy deep intelligence into every application
including IoT, web, mobile, social, business process and more
– IBM/Spark commitment: 3500 employees working on Spark
• Included in the IBM Open Platform (IOP) that runs on
Linux on Power
• Power Systems - key contributor to Spark
• Offering over 2x the performance per core for Spark workloads
compared to x86 Haswell * (SQL, ML, Graph, Streaming)
Open Platform
with Apache Hadoop
Open innovation to put data to work across the enterprise
* Based on Sparkbench on POWER8 P822L vs x86 E5-2690 V3; each 24 core and 256 GB RAM
Spark on Power

Watson and Analytics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (10)

Semelhante a Watson and Analytics

Semelhante a Watson and Analytics (20)

Watson and Analytics

Notas do Editor