Watson System

Watson System
By :

Devendra Chaplot
Priyank Chhipa

Pratik Kumar

What Computers Find Easier

• ln((12,546,798 * π) ^ 2) / 34,567.46
=

0.00885

What Computers Find Easier
Select Payment where Owner=―David Jones‖ and Type(Product)=“Laptop‖,
Owner

Serial Number

David Jones

45322190-AK

Type
LapTop

Payment

MyBuy

$104.56

Invoice #

45322190-AK

Vendor

INV10895
Serial Number

Invoice #

INV10895

David Jones
David Jones

=

Dave

Jones

David Jones

≠

What Computers Find Hard
Computer programs are natively explicit, fast and exacting in their
calculation over numbers and symbols….But Natural Language is implicit, highly
contextual, ambiguous and often imprecise.

Person
A. Einstein

Birth Place
ULM

• Where was X born?

Structured

Unstructured

One day, from among his city views of Ulm, Otto chose a water color to send to
Albert Einstein as a remembrance of Einstein´s birthplace.

• X ran this?

Person
J. Welch

Organization
GE

If leadership is an art then surely Jack Welch has proved himself a master painter
during his tenure at GE.

A Grand Challenge Opportunity
 Capture the imagination
– The Next Deep Blue

 Engage the scientific community
– Envision new ways for computers to impact
society & science
– Drive important and measurable scientific advances

 Be Relevant to Important Problems
– Enable better, faster decision making over unstructured and structured content
– Business Intelligence, Knowledge Discovery and
Management, Government, Compliance, Publishing, Legal, Healthcare, Business
Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc.

Real Language is Real Hard
• Chess
– A finite, mathematically well-defined search space
– Limited number of moves and states
– Grounded in explicit, unambiguous
mathematical rules

• Human Language
– Ambiguous, contextual and implicit
– Grounded only in human cognition
– Seemingly infinite number of ways to express the same meaning

Automatic Open-Domain Question Answering
A Long-Standing Challenge in Artificial Intelligence to emulate human expertise

 Given
– Rich Natural Language Questions
– Over a Broad Domain of Knowledge

 Deliver
–
–
–
–

7

Precise Answers: Determine what is being asked & give precise response
Accurate Confidences: Determine likelihood answer is correct
Consumable Justifications: Explain why the answer is right
Fast Response Time: Precision & Confidence in <3 seconds

You may have heard of IBM’s Watson…
A. What is the computer
system that played
against human
opponents on
“Jeopardy”…
and won.

Why Jeopardy?

The game of Jeopardy! makes great demands on its players – from the range of topical knowledge
covered to the nuances in language employed in the clues. The question IBM had for itself was―is it
possible to build a computer system that could process big data and come up with sensible answers
in seconds—so well that it could compete with human opponents?‖
8

Some Basic Jeopardy! Clues
• This fish was thought to be extinct millions of years ago
until one was found off South Africa in 1938
•
•

Category: ENDS IN "TH"
Answer: coelacanth

The type of thing being
asked for is often
indicated but can go
from specific to very
vague

• When hit by electrons, a phosphor gives off electromagnetic energy in this form
•
•

Category: General Science
Answer: light (or photons)

• Secy. Chase just submitted this to me for the third time--guess what, pal. This
time I'm accepting it
•
•

9

Category: Lincoln Blogs
Answer: his resignation

Lexical Answer Type
• We define a LAT to be a word in the clue that
indicates the type of the answer, independent of
assigning semantics to that word. For example in
the following clue, the LAT is the string
“maneuver.”
– Category: Oooh….Chess
– Clue: Invented in the 1500s to speed up the game, this
maneuver involves two pieces of the same color.

– Answer: Castling

Lexical Answer Type
• About 12 percent of the clues do not indicate an
explicit lexical answer type but may refer to the
answer with pronouns like “it,” “these,” or “this” or
not refer to it at all. In these cases the type of answer
must be inferred by the context.
Here’s an example:
– Category: Decorating
– Clue: Though it sounds “harsh,” it’s just embroidery, often
in a floral pattern, done with yarn on cotton cloth.
– Answer: crewel

How we convert data into knowledge for Watson’s use
Three types
of
knowledge

Domain Data
(articles, books, d
ocuments)

Training and test
question sets
w/answer keys

Converted to Indices for
search/passage lookup
Redirects extracted for
disambiguation
Frame cuts generated with
frequencies to determine likely
context

Pseudo docs extracted for
Candidate answer generation

Used to create logistic
regression model that
Watson uses for merging
scores

NLP Resources
(vocabularies, taxono
mies, ontologies)

Named entity
detection, relationship
detection algorithms

Custom slot grammar
parsers, prolog rules
for semantic analysis

Machine learning
• One of the core components of the system
– Multiple models
– 14000+ training questions

• Every candidate answer gets hundreds of
features/scores associated with it. There
features/scores are passed through previously
trained ML model for candidate answer scoring
• It's not just one model. In fact there is a chain of
models, each subsequent one utilizes scores
produced by previously run models
• Machine learning also used in other parts of the
system, such as LAT confidence analysis.
13

NLP
• Used in many places (Question Analysis, Evidence
Analysis, Content Pre-processing)

• Combines both rule and statistic based approaches
• Full NLP stack (used in QA)
– Tokenization
– Named Entity Recognition

– Deep Parsing and Predicate Argument Structure creation
– Lexical Answer Type (LAT) and Focus detection
– Anaphora resolution
– Semantic Relationships extraction

• Various technologies and techniques are used (English Slot
Grammar parser, R2 NED, machine learning for LAT confidence
analysis, custom annotators written in Prolog and Java)

NLP Examples
• LAT and Focus
– It's the Peter Benchley novel about a killer giant
squid that menaces the coast of Bermuda

• Named Entity Recognition
– It's the {Person::Peter Benchley} novel about a
killer giant {Animal::squid} that menaces the
{Location::coast of Bermuda}

• Anaphora Resolution
– Columbus embarked on his first voyage to this
continent in 1492. In the next two decades he led
three more expeditions there.

NLP in evidence analysis and content pre-processing
• Why do NLP on evidence passages and ingested
content?

• NLP in Evidence Analysis allows:
– LAT based scoring
– Named entities alignment based scoring

• NLP in Content Pre-processing
– Extracting and accumulating ―knowledge‖ frames from the
content
•

For instance
• SVO frame cuts will contain frequencies of Subject-VerbObject occurrences in the content that Watson has ingested.
• e.g squid menaces coast 809

– These ―knowledge‖ frames are then used to generate
candidate answers

Broad Domain
We do NOT attempt to anticipate all
questions and build databases.

We do NOT try to build a formal
model of the world

3.00%

2.50%

In a random sample of 20,000 questions we found
2,500 distinct types*. The most frequent occurring <3% of the time. The
distribution has a very long tail.

2.00%

And for each these types 1000’s of different things may be asked.
1.50%

1.00%

Even going for the head of the tail will
barely make a dent

0.50%

he
film
group
capital
woman
song
singer
show
composer
title
fruit
planet
there
person
language
holiday
color
place
son
tree
line
product
birds
animals
site
lady
province
dog
substance
insect
way
founder
senator
form
disease
someone
maker
father
words
object
writer
novelist
heroine
dish
post
month
vegetable
sign
countries
hat
bay

0.00%

*13% are non-distinct (e.g, it, this, these or NA)

Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text.
Structured sources (DBs and KBs) provide background knowledge for interpreting the text.

Automatic Learning for “Reading”

Volumes of Text

Syntactic Frames

Semantic Frames
Inventors patent inventions (.8)
Officials Submit Resignations (.7)
People earn degrees at schools (0.9)
Fluid is a liquid (.6)
Liquid is a fluid (.5)
Vessels Sink (0.7)
People sink 8-balls (0.5) (in pool/0.8)

Evaluating Possibilities and Their Evidences
In cell division, mitosis splits the nucleus & cytokinesis splits
this liquid cushioning the nucleus.







Organelle
Vacuole
Cytoplasm
Plasma
Mitochondria
Blood …

Many candidate answers (CAs) are generated from many different searches
Each possibility is evaluated according to different dimensions of evidence.
Just One piece of evidence is if the CA is of the right type. In this case a “liquid”.

Is(“Cytoplasm”, “liquid”) = 0.2
Is(“organelle”, “liquid”) = 0.1
Is(“vacuole”, “liquid”) = 0.2
Is(“plasma”, “liquid”) = 0.7

↑
“Cytoplasm is a fluid surrounding the nucleus…”
Wordnet  Is_a(Fluid, Liquid)  ?
Learned  Is_a(Fluid, Liquid)  yes.

Different Types of Evidence: Keyword Evidence
In May 1898 Portugal celebrated the
400th anniversary of this explorer’s
arrival in India.

In May, Gary arrived in India
after he celebrated his
anniversary in Portugal.
arrived in

celebrated

Keyword Matching

In May
1898
400th
anniversary

Evidence suggests
“Gary” is the answer
BUT the system must
learn that keyword
matching may be weak
relative to other types of
evidence

Keyword Matching

celebrated

In May

Keyword Matching

Portugal

anniversary

Keyword Matching

in Portugal

arrival in

India

explorer

Keyword Matching

India

Gary

Different Types of Evidence: Deeper Evidence
In May 1898 Portugal celebrated the
400th anniversary of this explorer’s
arrival in India.

On 27th May 1498, Vasco da Gama landed
in KappadBeach
Onin Kappad of May 1498, Vasco da
the 27th Beach
in Kappad Beach

Gama landed in Kappad Beach
Search Far and Wide
Explore many hypotheses

celebrated

Find Judge Evidence
Portugal

May 1898

400th anniversary

landed in

Many inference algorithms

Temporal
Reasoning

27th May 1498
Date
Math

Stronger
evidence can
be much
harder to find
and score.

arrival in

Statistical
Paraphrasing
Paraphrases

India

GeoSpatial
Reasoning

Kappad Beach
Geo-KB

Vasco da Gama

explorer

The evidence is still not 100% certain.

DeepQA
The technology & architecture behind Watson

The Difference Between Search & DeepQA
Decision Maker
Has Question

Search Engine

Distills to 2-3 Keywords
Reads Documents, Finds
Answers

Finds Documents containing
Keywords
Delivers Documents based on
Popularity

Finds & Analyzes
Evidence

Decision Maker
Asks NL Question
Considers Answer &
Evidence

Expe
rt
Understands Question
Produces Possible Answers &
Evidence
Analyzes Evidence, Computes
Confidence
Delivers Response, Evidence &
Confidence

DeepQA: the technology & architecture behind Watson
Learned Models
help combine and
weigh the Evidence

Evidence Sources

Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis
Generation

Hypothesis and
Evidence Scoring

Hypothesis
Generation

Hypothesis and Evidence
Scoring

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

Answer &
Confidence

1

Initial
Question

Question
& Topic
Analysis

Initial Question Formulated:
“The name of this monetary
unit comes from the word for
"round"; earlier coins were
often oval”

3
Question
Decomposition

Watson performs
question
analysis, determin
es what is being
asked.
2

It decides whether
the question needs
to be subdivided.

5
Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Hypothesis
Generation

Hypothesis
Generation

In creating the
hypotheses it will
use, Watson consults
numerous sources
for potential
answers…

4
Watson then starts
to generate
hypotheses based
on decomposition
and initial
analysis…as many
hypothesis as may
be relevant to the
initial question…

7

Evidence Sources

Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

6

Candidate
Answer
Generation

Hypothesis
Generation

Watson then uses
algorithms to “score”
each potential
answer and assign a
confidence to that
answer…

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis and
Evidence Scoring

Scoring

Watson uses
Evidence
Sources to
validate it’s
hypothesis and
help score the
potential
answers
If the question
was
decomposed, W
atson brings
together
hypotheses
from sub-parts

8

9
Answer Sources
Initial
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Hypothesis
Generation

Hypothesis
Generation

Using models
on the merged
hypotheses, Wa
tson can weigh
evidence based
on prior
“experiences”

Hypothesis
& Evidence
Scoring

Synthesis

Learned Models
help combine and
weigh the Evidence

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

10
Once Watson has
ranked its answers, it
then provides its
answers as well as the
confidence it has in
each answer.

Answer &
Confidence

Learned Models
help combine and
weigh the Evidence

Evidence Sources

Answer Sources
Initial
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis
Generation

Hypothesis and
Evidence Scoring

Hypothesis
Generation

Scoring

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

Answer &
Confidence

Step 0 : Content Acquisition
• Content acquisition is a combination
of manual and automatic steps.
• The first step is to analyze example questions
from the problem space to produce a
description of the kinds of questions that must
be answered and a characterization of the
application domain.
• Analyzing example questions is primarily a
manual task, while domain analysis may be
informed by automatic or statistical
analyses, such as the LAT analysis.

Step 1 : Question Analysis

Initial
Initial
Question

The system attempts to understand
Question
what the question is asking and
& Topic
Analysis
performs the initial analyses that
determine how the question will
be processed by the rest of the system.
• Question Classification e.g. puzzle/math
• Focus and Lexical Answer Type (LAT) e.g. “On
this day” LAT – date/day
• Relation Detection e.g. sea (India, x, west)

• Decomposition - divide and conquer.

Question
Decomposition

Step 2 : Hypothesis Generation
1.

Answer Sources
Primary
Search

Primary search :
–

Candidate
Answer
Generation

Keyword based search

–

Top 250 results are considered for Candidate
Answer generation.

–

Question
Decomposition

Empirical statistics : 85% time answer is within
top 250 results.

Hypothesis
Generation

Hypothesis
Generation

2. CA generation : generates CAs using results of
Primary Search

3.

Soft Filtering
– lightweight (less resource intensive) scoring algorithms to a larger set of
initial candidates to prune them down to a smaller set of candidates
– Reduction in number of CA to approx. 100
– Answers are not fully discarded , may be reconsidered at final stage.

Hypothesis
Generation

Step 2 : Hypothesis Generation
4. Each CA plugged back into the question is
considered a hypothesis which the system has to
prove correct with some threshold of confidence.
5. If failed at this state , system has no hope of
answering the question whatsoever.
– Noise tolerance - tolerate noise in the early stages of the
pipeline and drive up precision downstream
– Favors recall over precision, with the expectation that the rest
of the processing pipeline will tease out the correct
answer, even if the set of candidates is quite large

Step 3 : Hypothesis & Evidence scoring
• Candidate answers that pass the soft filtering threshold
undergo a rigorous evaluation process that involves 2
steps :1.

Evidence retrieval :
–

Gathers additional supporting evidence for each candidate
answer, or hypothesis.
e.g. Passage search: gathering passages by adding CA to
primary search query.

2. Scoring:
–
–

Deep content analysis – includes many different
components, or scorers, that consider different dimensions
of the evidence
Produce a score that corresponds to how well evidence
supports a candidate answer for a given question.

Step 4 : Final Merging and Ranking
• Merging:
– Multiple candidate answers for a question
may be equivalent despite very different
surface forms.
– Using an ensemble of matching, normalization
and co-reference resolution algorithms, Watson
identifies equivalent and related hypothesis.

– Without merging, ranking algorithms would
be comparing multiple surface forms that
represent the same answer and trying to
discriminate among them.

Step 4 : Final Merging and Ranking
• Ranking and confidence estimation:
– After merging, the system must rank the hypotheses
and estimate confidence based on their merged
scores

– These hypothese are ran over set of training
questions with known answers.
– Watson’s metalearner uses multiple trained models
to handle different question classes as, for
instance, certain scores that may be crucial to identifying the correct answer for a factoid question may
not be as useful on puzzle questions

The Final Blow! (ctd.)

“I for one welcome our new computer overlords” - Jennings

38

Watson – a Workload Optimized System
•
•

2880 POWER7 cores

•

POWER7 3.55 GHz chip

•

500 GB per sec on-chip bandwidth

•

10 Gb Ethernet network

•

15 Terabytes of memory

•

20 Terabytes of disk, clustered

•

Can operate at 80 Teraflops

•

Runs IBM DeepQA software

•

Scales out with and searches vast amounts of unstructured information with UIMA
& Hadoop open source components

•

Linux provides a scalable, open platform, optimized
to exploit POWER7 performance

•
1

90 x IBM Power 7501 servers

10 racks include servers, networking, shared disk system, cluster controllers

Note that the Power 750 featuring POWER7 is a commercially available
server that runs AIX, IBM i and Linux and has been in market since Feb 2010

Watson: Precision, Confidence & Speed
•

Deep Analytics – We achieved

champion-levels of
Precision and Confidence over a huge variety of
expression

•

Speed – By optimizing

•

Results –

Watson’s computation for
Jeopardy! on 2,880 POWER7 processing cores we
went from 2 hours per question on a single
CPU to an average of just 3 seconds – fast
enough to compete with the best.
in 55 real-time sparring against former
Tournament of Champion Players last
year, Watson put on a very competitive
performance, winning 71%. In the final
Exhibition Match against Ken Jennings and Brad
Rutter, Watson won!

Potential Business Applications
Healthcare Analytics
• Analyzing: E-Medical records, hospital reports
• For: Clinical analysis; treatment protocol
optimization
• Benefits: Better management of chronic
diseases; optimized drug formularies; improved
patient outcomes

• Analyzing: Call center logs, emails, online media
• For: Buyer Behavior, Churn prediction
• Benefits: Improve Customer satisfaction and
retention, marketing campaigns, find new
revenue opportunities

Crime Analytics

Insurance Fraud

• Analyzing: Case files, police records, 911 calls…
• For: Rapid crime solving & crime trend analysis
• Benefits: Safer communities & optimized force
deployment

• Analyzing: Insurance claims
• For: Detecting Fraudulent activity & patterns
• Benefits: Reduced losses, faster
detection, more efficient claims processes

Automotive Quality Insight

Social Media for Marketing

• Analyzing: Tech notes, call logs, online media
• For: Warranty Analysis, Quality Assurance
• Benefits: Reduce warranty costs, improve
customer satisfaction, marketing campaigns

41

Customer Care

• Analyzing: Call center
notes, SharePoint, multiple content repositories
• For: churn prediction, product/brand quality
• Benefits: Improve consumer
satisfaction, marketing campaigns, find new
revenue opportunities or product/brand quality
issues

References
• The AI magazine

– Ferrucci, David, et al. "Building Watson: An overview of
the DeepQA project." AI magazine 31.3 (2010): 59-79.

• Watson Systems:
– http://www-03.ibm.com/innovation/us/watson/

• Wiki Page
– http://en.wikipedia.org/wiki/Watson_%28computer%2

• Building Watson A Brief Overview of the DeepQA
Project by Joel Farrell, IBM
– http://www.medbiq.org/sites/default/files/presentations
/2011/Farrell.ppt

References
• What is Watson, really?
– http://www01.ibm.com/software/ebusiness/jstart/downloads/IOD2011.ppt
– Authors : Keyur Dalal (IBM), Vladimir Stemkovski (IBM) and Jeff
Sumner (IBM)

• Jeopardy! IBM Watson Day 1 (Feb 14, 2011)
– http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related

• Science Behind an Answer– http://www-03.ibm.com/innovation/us/watson/what-iswatson/science-behind-an-answer.html
– Video: http://youtu.be/DywO4zksfXw

Watson System

Recommended

Recommended

More Related Content

Similar to Watson System

Similar to Watson System (20)

Recently uploaded

Recently uploaded (20)

Watson System

Editor's Notes