SlideShare uma empresa Scribd logo
1 de 39
Mass Declassification What If? Jeff Jonas,  IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010
The Ask ,[object Object],[object Object],[object Object],[object Object]
The Problem at Hand ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
In Today’s Session ,[object Object],[object Object],[object Object],[object Object],[object Object]
Context Accumulating Systems
From Pixels to Pictures to Insight  Observations Context Relevance Consumer (An analyst, a system,  the sensor itself, etc.) Contextualization
[object Object],[object Object]
Without Context [email_address]
Consequences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Context Accumulation Trusted Supplier Job  Applicant Stolen  Identity  Known Terrorist [email_address]
Puzzle Metaphor Primer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Context Accumulates ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
False Negatives Overstate The Universe Observations Unique Identities True Population
Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2
The Rise and Fall of a Population Observations Unique Identities True Population
Data Triangulation  Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smith 443-43-0000 DL: 00001234 New Record
Increasing Accuracy and Performance Observations Unique Identities True Population
“ Expert Counting” is Fundamental to Prediction ,[object Object],[object Object],[object Object],[object Object]
Mass Declassification Predictions
Mass Declassification Predictions ,[object Object],[object Object],[object Object]
Using What Data Points? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open Source Discovery/Scoring ,[object Object],[object Object],[object Object],[object Object],[object Object]
Context Accumulation FOIA March 2010 Open Source Reference Dirty  Word Classified –  Asserted Mufasa 7 Warhead
Context Accumulation + Statistics ,[object Object],[object Object],[object Object],[object Object],[object Object],Declassification dispositions … becoming a force multiplier. The more human dispositions, the more automated dispositions. Human Triage Auto Triage 5,000 20 10,000 4,000 100,000 65,000 1,000,000 17,000,000
Policy Questions ,[object Object],[object Object],[object Object],[object Object]
Strawman Architecture
Strawman Architecture 450M Docs Historical  Dispositions DirtyWords Etc. Feature  Extraction  & Classification Context  Accumulation Predictions(*) Workflow System (*) Recommendations: Equity of, Disposition, Priority Dispositions
Another Idea: Crowd Sourcing ,[object Object],[object Object]
Another Idea: Better Classification  ,[object Object],[object Object]
Challenges
Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Closing Thoughts
Closing Thoughts ,[object Object],[object Object],[object Object],[object Object]
Worst Case Scenario ,[object Object],[object Object]
Related Blog Posts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Blogging At: www.JeffJonas.TypePad.com Information Management Privacy National Security  and Triathlons Questions?
Mass Declassification What If? Jeff Jonas,  IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010

Mais conteúdo relacionado

Semelhante a Mass declassification sept 23 2010v2.1

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Qualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel KatzQualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel Katzsmahboobani
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesAlvaro Graves
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCTJ Stalcup
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finaljcscholtes
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
ACEDS - ZyLAB webinar - AI Based eDiscovery Analytics
ACEDS - ZyLAB webinar - AI Based eDiscovery AnalyticsACEDS - ZyLAB webinar - AI Based eDiscovery Analytics
ACEDS - ZyLAB webinar - AI Based eDiscovery AnalyticsAnnelore van der Lint
 
The Future of Advanced Analytics
The Future of Advanced AnalyticsThe Future of Advanced Analytics
The Future of Advanced AnalyticsHaystax Technology
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisAnton Chuvakin
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Data Visualizations in Cyber Security: Still Home of the WOPR?
Data Visualizations in Cyber Security: Still Home of the WOPR?Data Visualizations in Cyber Security: Still Home of the WOPR?
Data Visualizations in Cyber Security: Still Home of the WOPR?Matthew Park
 
Career_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxCareer_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxHarpreetSharma14
 
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...Tata Consultancy Services
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 

Semelhante a Mass declassification sept 23 2010v2.1 (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Qualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel KatzQualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel Katz
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologies
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
ACEDS - ZyLAB webinar - AI Based eDiscovery Analytics
ACEDS - ZyLAB webinar - AI Based eDiscovery AnalyticsACEDS - ZyLAB webinar - AI Based eDiscovery Analytics
ACEDS - ZyLAB webinar - AI Based eDiscovery Analytics
 
The Future of Advanced Analytics
The Future of Advanced AnalyticsThe Future of Advanced Analytics
The Future of Advanced Analytics
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Data Visualizations in Cyber Security: Still Home of the WOPR?
Data Visualizations in Cyber Security: Still Home of the WOPR?Data Visualizations in Cyber Security: Still Home of the WOPR?
Data Visualizations in Cyber Security: Still Home of the WOPR?
 
Career_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxCareer_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptx
 
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Mass declassification sept 23 2010v2.1

Notas do Editor

  1. Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high. Remember, natural language is ambiguous, polysemous, tacit and its meaning is often highly contextual. Bottom line -- the computer needs to consider many possible meanings, attempting to find the inference paths that are most confidently supported by the data. The primary computational principle supported by the DeepQA architecture is to assume and maintain multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and process many different evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed from the outset to support interoperability across independently developed analytics. For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community and now an Apache Project (http://uima.apache.org) Over 100 different algorithms, implemented as UIMA components, were developed, advanced and integrated into this architecture to build Watson . In the first step, Question and Category analysis , parsing algorithms decompose the question into its grammatical or syntactic components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”. In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system. In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. These searches are performed over a combination of unstructured data, natural language documents, and structured data, available knowledge bases. The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is not a lot of confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus is on generating a broad set of hypotheses, – or for this application what we call “Candidate Answers”. To implement this step for Watson we used multiple open-source text and KB search components. DeepQA, acknowledges that resources are ultimately limited. And some parameterized judgment about which candidate answers are worth pursuing further must be made given constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and latency, DeepQA uses soft filtering -- it uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the filter would be eliminated from consideration entirely at this point. In Hypothesis & Evidence Scoring the candidate answers are scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”? Evidence , in this case, more documents, passages and more structured facts, are collected for the many candidate answers. Each of these pieces of evidence are subjected to many independently developed algorithms that deeply analyze the evidentiary passages, for example, and score the likelihood that the passage supports or refutes the correctness of the candidate answer. In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire, with varying levels of certainty, They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts. Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----------------------- The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure. No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, 1 month later might only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops is regularly trained, evaluated and retrained. DeepQA is a complex system architecture designed to incrementally extend both in data and algorithms to deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system. -David A. Ferrucci