8. A value-based health system must appropriately balance resources expended in keeping people healthy Value-based Health Care System 20% of people generate 80% of costs Health care spending Early Symptoms Health Status Source: IBM Global Business Services and IBM Institute for Business Value Healthy / Low Risk High Risk At Risk Active Disease Early Clinical Symptoms
9.
10. “ Smarter Healthcare” + + Improve operational effectiveness Achieve better quality and outcomes Collaborate for prevention and wellness A smarter health system collaborates across health and care settings, activating individuals in their own health and making best use of resources to treat conditions, keep people healthy and deliver greater individual value. Intelligent Instrumented Interconnected
11. Health systems will need to emphasize different competencies to thrive in a changing healthcare environment Empower Patients Empower members to assume accountability and make more informed health and financial choices Collaborate with Providers Help providers become successful in a value-base reimbursement environment Innovate Collaboratively innovate products and services, operational processes, and business models Optimize Operational Efficiencies Continue driving costs down in order to maximize margins in a highly regulated industry Enable through Information Technology Flexible applications, BI, on-demand information, effective operations/management & governance Competencies
14. The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions $600 In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus $200 If you're standing, it's the direction you should look to check out the wainscoting . $2000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that’s farthest north $1000 The first person mentioned by name in ‘The Man in the Iron Mask’ is this hero of a previous book by the same author. Broad/Open Domain Complex Language High Precision Accurate Confidence High Speed
15. Keyword Evidence Keyword Matching Keyword Matching Keyword Matching Keyword Matching Keyword Matching celebrated India In May 1898 400th anniversary arrival in Portugal India In May Gary explorer celebrated anniversary in Portugal arrived in In May , Gary arrived in India after he celebrated his anniversary in Portugal . In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. This evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence
16.
17. The Missing Link On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first. TV remote controls, Buttons Shirts, Telephones Mt Everest He was first Edmund Hillary
18. DeepQA : The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing , Information Retrieval , Machine Learning and Reasoning Algorithms. These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find. . . . 1000’s of Pieces of Evidence Multiple Interpretations 100,000’s Scores from many Deep Analysis Algorithms 100’s sources 100’s Possible Answers Balance & Combine Answer Scoring Models Answer & Confidence Question Evidence Sources Models Models Models Models Models Primary Search Candidate Answer Generation Hypothesis Generation Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Synthesis Answer Sources Question & Topic Analysis Evidence Retrieval Deep Evidence Scoring Learned Models help combine and weigh the Evidence Hypothesis Generation Hypothesis and Evidence Scoring Question Decomposition
19. Baseline 12/06 v0.1 12/07 v0.3 08/08 v0.5 05/09 v0.6 10/09 v0.8 11/10 v0.4 12/08 DeepQA: Incremental Progress in Answering Precision on the Jeopardy Challenge: 6/2007-11/2010 v0.2 05/08 IBM Watson Playing in the Winners Cloud V0.7 04/10
20.
21.
22.
23. Why is Watson Technology ideal for Healthcare? Source: IBM Research, MI, SCIP, BCG analysis Interprets and understands natural language questions Understands ambiguous and imprecise questions using sophisticated natural language algorithms Analyzes large volumes of unstructured data Synthesizes broad domain of unstructured data from a variety of selected public, licensed and private sources Quantifies degrees of confidence in potential answers Generates hypotheses and ranks degrees of confidence in a range of potential answers based on evidence Supports iterative dialogue to refine results Internal iterative and interactive question and answering to refine and improve results Adapts and learns to improve results over time Learns from additional evidence, additional questions and mistakes to improve accuracy over time What condition has red eye, pain, inflammation, blurred vision, floating spots and sensitivity to light? Physician Notes, Medical Journals, Pathology results, Clinical Trials, Wikipedia, etc/ Family History, Physical Exam, Current Medications, etc. New Clinical Recommendations. New Drugs. Approved use of Drugs, etc. Uveitis 91% Iritis 48% Keratitis 29%
24. A Range of Watson-enabled Healthcare Solutions Patient Caregiver…Nurse…Physician Assistant Clinician Specialty Diagnosis & Treatment Options Longitudinal Patient Electronic Health Information Specialty Research Genomic-based Analysis Coding Analysis & Automation Caregiver Education Consumer Portal Patient Workup Differential Diagnosis Treatment Options Patient Inquiry On-going Treatment Treatment Protocol Analysis Treatment Authorization Population Analysis & Care Mgmt Second Opinion Care Consideration Automation
25. Key Elements of the Clinical Diagnostic Reasoning Process Bowen J. N Engl J Med 2006;355:2217-2225
26. Graber, et al. Diagnostic Error in Internal Medicine, Arch Int Med 2005; 165:1493-1499
33. Convergence of Market and Technical Forces Create an Opportunity to Leverage Watson for Healthcare 8/2/2011 Advances in Analytics Advances in Mobile Computing Watson / DeepQA platform Partner Ecosystem Next Generation Solutions for Healthcare Pharma Medical Research Centers Information Providers Private, Public Insurers Medical Devices Providers Public Health Demand for Healthcare Reform Spiraling Costs Medical Information Overload Quality and Safety Concerns Aging Population Watson for Healthcare
34. Watson: How It Works Runtime Pipeline Input CASes Answers & Confidence Question and Answer Key Output CASes A variety of NLP algorithms analyze the question and the context to attempt to figure out what is being asked. (named entities, relations, LAT detection, question class) Retrieve content related to the question using index search on documents, passages and structured repositories From retrieved content, extract the words or phrases that could be possible answers Consider all the scored evidence to produce a final ranked list of answers with confidence For each candidate answer, retrieve more content that relates that answer to the question Many algorithms attempt to determine the degree to which the retrieved evidence supports the candidate answers. Training/Testing questions with medically vetted answers WaaS API Initial Scoring of candidate answers independent of supporting passages Build an abstract query from question analysis Removes candidates that should not proceed to remaining phases Question Analysis Query Builder Context-Independent Answer Scoring Candidate Answer Filtering Supporting Evidence Retrieval Context-Dependent Answer Scoring Final Merger Primary Search Search Result Processing Candidate Answer Generation Context Analysis
Notas do Editor
Question in form of a statement, answer in form of a question
RSDC 2009 12/09/11 20:19
Here we see the same question on the right <read it again> To identify and gain confidence in better evidence, the system must parse the question, determining its grammatical structure and identify the main predicates like celebrated and arrived along with their main arguments (that is their subjects and objects, etc) for example -- who is doing the celebrating , and who is doing the arriving AND for each of these actions where and when are they happening. This would further require the system to attempt to distinguish places , dates and people from each other and from other words and phrases in the question. On the right side, we see a passage containing the RIGHT answer BUT with only one key word in common -- “ MAY ”. <read the green passage> Given just that one common and very popular term, the system must look at a huge amount of unrelated stuff to even get a chance to consider this passage and then must employ and weigh the right algorithms to match the question with an accurate confidence, for example in this case <click> Temporal reasoning algorithms can relate a 400 th anniversary in 1898 to 1498, Statistical Paraphrasing algorithms can help the computer learn from reading lots of texts that landed in can imply arrived in and finally with Geospatial reasoning using geographical databases the system may learn that Kappad Beach is in India and if you arrive in Kappad Beach you have therefore arrived in India. And still, all of this will admit numerous errors since few of these computations will produce 100% certainty in mapping from words, to concepts to other words. Just as an example, what if the passage said “considered landing in” rather than “landed in” or what if it the question said “arrival in what he thought to be India?”. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard … the authors intended meaning can be expressed in so many different ways. To achieve high levels of precision and confidence you must consider much more information and analyze it more deeply. We needed a radically different approach that could rapidly admit and integrate many algorithms , considering lots of different bits of evidence from different perspectives, AND that could learn how to combine and weigh these different sorts of evidence ultimately determining how strongly or weakly they support or refute possible answers.
<click> Watson – the computer system we developed to play Jeopardy! is based on the DeepQA softate archtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high. Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data. So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. <UIMA Mention> For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community. Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson . In the first step, Question and Category analysis , parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”. In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system. In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet. These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training. The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components. After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point. In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”? For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning. In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts. Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ---- The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure. No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained. DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
What we did for Jeopardy! Applied to Healthcare too. This is one aspect. There are others as well. Makes a small point. Emphasize multiple aspects of evidence?
12/09/11 20:19
12/09/11 20:19
12/09/11 20:19
Key points: Timing is as good as it has been to introduce new decision support solutions for Healthcare Our strategy is holistic involving many varied partners What role is of interest? Providers include doctors, nurses, hospitals, clinics