The Large Data Demonstration Project aims to create a timely and workable national health data network design through a test project. It seeks to concurrently address governance issues and demonstrate improvements in care. The project intends to validate the temporal and cost efficiencies of such a network system. Overall, the demonstration project explores building the foundation for a national Learning Health System to improve American healthcare through increased data sharing and analysis.
4. THE PURPOSE OF THE LD DEMO
IS TO CREATE A TIMELY, WORKABLE
LHS NETWORK
Test an architecture for an LHS national data network design
Concurrently address governance and other issues
Demonstrate improvements in care to substantiate to
patients the importance of the ultimate system for them and
mobilize them
Validate temporal and cost efficiencies of the network system
4/21/2012 LD DEMONSTRATION PROJECT 4
5. Credits
This presentation is based upon the work and inspiration of:
Charles P. Friedman, PhD, Joseph H. Kanter Family Foundation Learning Health Summit, May 2012 and
CDISC International Interchange Keynote: A Learning Health System November 7, 2011
BTRIS, James J. Cimino, Chief, Laboratory for Informatics Development, NIH
IBM Watson: From Jeopardy to Healthcare, David Gondek, PhD, Technical Lead, Watson Healthcare
Adaptation, AMIA, October 2011
FasterCures, Health IT: Think Research, 2005; Still Thinking Research, 2011 (Adam Clark…)
Valuing Health Care, Ewing Marion Kauffman Foundation, Task Force Report, April 2012
The Strategic Health IT Advanced Research Projects (SHARP) Program, Office of the National
Coordinator on Healthcare
Dr. Eliot Siegel, University of Maryland School of Medicine; Dr. Watson – A Promising Student in Pursuit
of Smarter Medicine, NLM Briefing, 2011: Educating “Dr. Watson” To Usher in a New Era of Intelligent,
Vigilant, and Personalized Medicine, IBM Toronto 2011, IBM 100th Anniversary Keynote.
A Consensus Action Agenda for Achieving the National Health Information Infrastructure, J Am Med
Inform Assoc. 2004 Jul—Aug; 11(4): 332-338
The Reports of the Institute of Medicine on Learning Health.
The 40 years of work of Dr. Donald Lindberg and Betsy Humphreys, and their “fellows” and staff at the
(NLM) that created and maintain the UMLS, without which we could not move forward.
AND SO MANY OTHERS
4/21/2012 LD DEMONSTRATION PROJECT 5
6. A LEARNING HEALTH SYSTEM
A Learning Health System (LHS) is an "ultra large scale" system that can
serve the entire nation to promote individual and population health that can
mine and analyze electronic medical records to:
• track patient treatment over time across institutions;
• compare (CER) treatments;
• facilitate decision support systems;
• choose best outcomes for individual patients (Personalized Medicine);
• help identify potential research subjects (by characteristic);
• spot and track public health emergencies; and
• better monitor drug safety (Surveillance Post Market of New Drugs).
“ … one in which progress in science, informatics, and care culture align to
generate new knowledge as an ongoing, natural by-product of the care
experience, and seamlessly refine and deliver best practices for continuous
improvement in health and health care.” (Institute of Medicine)[Redesigning the
Clinical Effectiveness Research Paradigm: Innovation and Practice-Based Approaches -
Workshop
4/21/2012 LD DEMONSTRATION PROJECT 6
8. THE NEED FOR A LEARNING HEALTH SYSTEM
The U.S. Spends 1.5-1.7x other industrialized countries
U.S. is 37th in Outcomes
$2.5 Trillion Spent annually
o Of which CMS spends $900 bil.
o $100 bil. To VA, DOD, Indian Health Services, and Federal Employee Health Benefit Plans
o $1.0 Trillion of Federal Dollars spent on health goods and services
$700 bil. Was not necessary
A very big HIT: America’s health industry is preparing for a $30 billion splurge on information technology,
The Economist, Nov 22nd 2010
Government Launches to providers and patients, they would have strong incentives to join the data
revolution. Kauffman Task Force Report, Page 21. Research Initiative To Harness 'Big Data‘ : March 29,
2012http://www.ihealthbeat.org/articles/2012/3/29/government-launches-research-initiative-to-harness-
big-data.aspx#ixzz1smgkQjA6
Potential payoffs
o The McKinsey Global Institute estimates that mobilizing health care information could yield more than
$300 billion a year in additional value, or almost $1,000 a year for every person in the United States.
o Of these sums, at least two-thirds would take the form of reduced national spending on health care.
o If even a fraction of that unlocked value could be returned
4/21/2012 LD DEMONSTRATION PROJECT 8
9. BENEFITS OF A
LEARNING HEALTH SYSTEM
• The U.S. Healthcare System generates immeasurable amounts
of Information
• Information Systems Lead to Transparency
• Businesses use information to identify waste and identify
best practice
• We have used Claims information to do some analysis of
treatments
• When clinical information is digitized in the form of an
electronic health record (EHR) it becomes more valuable
because it can be compared, searched, and queried in
ways that can benefit the patient, other patients with the
same disease or disorder, and the research enterprise,
which is aiming to develop better diagnostics and
therapies. (Adam Clark..)
4/21/2012 LD DEMONSTRATION PROJECT 9
10. OBSTACLES TO
LEARNING HEALTH SYSTEMS
• Data is not owned by the patient
o Patient has access rights and not ownership
o Creator of the medical record is the owner
o Owners have assigned those rights to HMOs and to their EHR system providers
o Data is disbursed
o Data is not in standardized format
• Economic Disincentives to Productivity and Cost Reduction
o Insurers – Spend Less – Make Less
o Patients – Employer System – Until Now No Burden on Patients if they consume
more– But now that Employee share of cost is rising
o IF Patients help reduce Cost they get nothing for it
• Clinical data and Medical Records data are still fragmented
• Lack of Cooperation or Incentives to Cooperate
Senator DASCHLE:
“I think that one of the things that we have faced all through our health sector is
too much siloing and stove piping and not enough coordination. And it seems to
me that one of the things we need to do at all levels, state level, the federal level
and certainly in the private sector is to encourage more coordination and
information sharing.” Bipartisan Policy Center - Forum on Health IT Jan. 27, 2012.
4/21/2012 LD DEMONSTRATION PROJECT 10
11. DR. CHARLES FRIEDMAN
“I was invited to a meeting about building a learning healthcare
system for cancer, and was asked to speak about how the ONC's
activities are going to create a learning healthcare system. So as more
and more data, maybe even information, is available in EHR systems,
what are we doing to make that data useful for research? After a few
hours of working on my speech it hit me— we aren't; ONC is going to
fall short of that goal. So I changed my way of thinking. Everybody is
focused right now on getting eligible professionals and hospitals to the
state of meaningful use and many can't fathom dealing with other uses
of the data on top of that, at least not yet. But my answer is we can't
afford not to do this. We are so sub-optimizing and failing to take full
advantage of our investment.”
DR. CHARLES FRIEDMAN, OFFICE OF THE NATIONAL COORDINATOR FOR
HEALTH INFORMATION TECHNOLOGY, HHS (JUNE 2010)
4/21/2012 LD DEMONSTRATION PROJECT 11
12. HEALTH SECTOR DEMANDS FOR
INFOMATICS RELATIMG TO A
LEARNING HEALTH SYSTEM
4/21/2012 LD DEMONSTRATION PROJECT 12
Source: PwC Analysis
PwC Health Resource Institute 8
13. 2005 OBSERVATIONS OF FASTERCURES
• In 2005, FasterCures urged health systems to “think research” when developing
or implementing EHR systems so as not to foreclose a golden opportunity to
connect clinical data with research needs.
• At the time, FasterCures saw the opportunity for EHRs to not only provide a link
between genes and disease, but also to: Monitor the health of the populations
and detect emerging health problems.
• Identify populations at risk of disease, or those who might benefit most from
therapies.
• Assess the usefulness of diagnostic tests and screening programs.
• Form hypotheses about disease initiation and progression.
• Conduct post-marketing surveillance studies of new drugs to identify adverse
events, improve prescribing practices, or make labeling more accurate and
complete.
• Identify potential study participants for clinical research.
4/21/2012 LD DEMONSTRATION PROJECT 13
14. 2005 FASTERCURES
RECOMMENDATIONS
• Aggregate and Integrating practice databases for data
mining.
• Developing more sophisticated abstraction and
encryption systems to protect privacy.
• Developing database connection tools.
• Creating translational systems.
• Formulating online informed consent procedures.
• Evolving data mining and pattern recognition systems.
• Developing interactive patient query programs.
• Creating patient databases/warehouses/registries.
• Creating directories of clinical databases.
4/21/2012 LD DEMONSTRATION PROJECT 14
15. 2011 FINDINGS – STILL THINKING
• Vendors of new EHR systems are not building research capacity into the
architecture.
• The clinical research community is not actively involved in or does not have
incentives to push for research-friendly EHR systems.
• Standards and universal exchange systems still challenge the actual transfer and
translation of research-relevant data.
• Existing EHR systems are not being leveraged to screen, match, and enroll
patients in clinical trials.
• The patient community is not fully engaged in or aware of the need to share
their clinical data to advance research.
• Clinical trial screening and matching should be included as a measure for
“Meaningful Use” of electronic health record systems. The National Institutes of
Health (NIH) should articulate a strategy that will align its programs with the
recommendations of the Office of the National Coordinator (ONC) Federal
Health IT Strategic Plan.
4/21/2012 LD DEMONSTRATION PROJECT 15
17. RECOMMENDATIONS OF THE KAUFFMAN
FOUNDATION TASK FORCE (2012)
“Harnessing information: how systematically gathering and
sharing data can unlock knowledge that produces
systematically better choices. The key here is to incentivize
a new corps of data entrepreneurs to collect and analyze
existing medical data ...”
Valuing Healthcare: Improving productivity and Quality, Kauffman
Foundation Task Force on Cost Effective Health Care Information,
April 19, 2012
4/21/2012 LD DEMONSTRATION PROJECT 17
18. RECOMMENDATIONS (Continued)
• Allow patients and research subjects in studies to give their consent for
their health data to be included in large research databases.
• The government should permit patients the right to let whomever they
choose access their medical records efficiently and easily. The
Department of Health and Human Services could provide regulatory
assurance that there will be no punitive action against experimental
pilot projects to pool health data. If HHS does not believe it has this
authority, it should request it from Congress.
• The thousands of nonprofit organizations actively involved in studying
diseases should partner to build a national health database. Employers
should include as part of health benefits packages information on how
employees can contribute their health data.
• The National Institutes of Health could more strictly enforce existing
rules and otherwise require that federally funded data be shared, and
that all grants require data-sharing plans. Follow-on NIH funding could
be conditioned on data making it to the public domain and being re-
used.
4/21/2012 LD DEMONSTRATION PROJECT 18
19. U.S. TECHNOLOGY RESOURCES
• In a century of staggeringly rapid improvements in medical
knowledge and technology throughout the West and Asia, the
United States towers over others.
• The United States funds (publicly and privately) more than
$60 billion per year in medical research.
• We have, due to the work foster by ONC and the SHARP
project demonstrated the ability you move data within a
number of Health Information Exchange Networks.)
[Infrastructure]. (Maryland CRISP, CCC, and Others)
• While we build a greater Electronic Medical Records
capability by incentivizing Practioners and Hospitals that do
not have electronic patient medical records systems we have
shining examples of such systems in place.
See Digital Infrastructure for the Learning Health System: The Foundation for
Continuous Improvement in Health and Health Care, IOM, 2011
4/21/2012 LD DEMONSTRATION PROJECT 19
20. GREAT EXAMPLES OF SYSTEMS IN 2011
MOSTLY IN SILOS
• The Strategic Health IT Advanced Research Program (SHARP)
• SHARP supports research projects on breakthrough advances in health IT that foster adoption, including security, patient support,
healthcare applications, and network design, and secondary use of EHR data.
• The Mayo SHARP project.
• THE HEALTH MAINTENANCE ORGANIZATION RESEARCH NETWORK (HMORN) –Federation Model
• THE eMERGE (ELECTRONIC MEDICAL RECORDS AND GENOMICS) NETWORK
• i2b2, based out of the Partners HealthCare System in Boston
• Researchers at the University of Utah are testing capabilities of i2b2 as an open source tool for bench-to-bed-side research
conducted outside the Partners HealthCare network.
• Geisinger Health System, Kaiser Permanente, Columbia Presbyterian, and the Mayo Clinic are the grandparents of health IT and
its use in clinical and research practice.
• THE PARTNERSHIP TO ADVANCE CLINICAL ELECTRONIC RESEARCH (PACeR) (New York)
• DR. SUSAN LOVE RESEARCH FOUNDATION - Army of Women
• RPDR AT PARTNERS HEALTHCARE is a centralized clinical data registry that gathers data from various hospital legacy systems
and stores it in one place.
• NIH'S BIOMEDICAL TRANSLATIONAL RESEARCH INFORMATION SYSTEM (BTRIS) BTRIS is a repository developed from a
complex network of information systems supporting clinical care and research data collection from NIH-sponsored clinical trials
conducted by the NIH Clinical Center and the agency's intramural research program.
• Moffit, Sloan Kettering, Anderson ….
• Kaiser Permanente, Mayo Clinic, Geisinger Health System, Intermountain Health, and Group Health Collaborative form
new consortium to share patient e-health records on-demand and serve as a national model for data interoperability.
• AND THERE ARE MORE EXAMPLES….
4/21/2012 LD DEMONSTRATION PROJECT 20
21. PROGRESS IN EUROPE
Will U.S. be left behind?
• The GPRD – Soon to be the CPRD
oFrom 5 million to 55 million patients in a database
with a new $100 mil. Investment by the NHS (UK)
• The EHR4cr project to unify 7 European Countries
funded by the EU Commission and the International
Pharmaceutical Association with an initial $20.0 mil.
4/21/2012 LD DEMONSTRATION PROJECT 21
22. SO WHY NOT NOW?
“We have to take the opportunity that comes in front of us. There are
now 20-30 million people whose care is delivered by an HMO that
already has EMR – so could we build an infrastructure to help look
at the health delivery process to see what works, and have all these
people available to answer these questions with quick
turnaround?”
Francis S. Collins, MD, PhD, NIH, Forum Research America
National Forum, March 14, 2011
“So why hasn’t all this been done? There is no shortage of raw
information in the health care system. But it is locked in medical
offices and hospitals across the country, and in the files of
pharmaceutical companies who guard the results of their failed
clinical trials.” Kauffman – Page 19
4/21/2012 LD DEMONSTRATION PROJECT 22
23. MOVING FORWARD FIRST STEPS
• Despite centuries of clinical research, data are still
fragmented
• Sophisticated terminology is the key to reuse of
disparate data
• Policy issues are bigger than technical issues
• Take on technical issues as a first step to
demonstrate capability of solving individual patient
problems
4/21/2012 LD DEMONSTRATION PROJECT 23
24. OBJECTIVES OF THE LARGE DATA
DEMONSTRATION PROJECT
• The demonstration project (pilot) can show the benefits of data sharing can
begin to flow before the whole health care system is networked without
sweeping reforms or full implementation of eHR systems ten years away.
• It will show benefits to patients and thereby incentivize consumers to push for
adoption of eHR systems by all practitioners from the bottom up and not from
the top down.
• To use Existing Resources of Medical Information
o Existing Electronic Medical Records
• Reduce and circumvent institutional obstacles.
• Share Data and Cooperate in a Model to Demonstrate Feasibility:
• “I think that one of the things that we have faced all through our health sector
is too much siloing and stove piping and not enough coordination. And it seems
to me that one of the things we need to do at all levels, state level, the federal
level and certainly in the private sector is to encourage more coordination and
information sharing.” Senator Tom Daschle - Bipartisan Policy Center - Forum
on Health IT Jan. 27, 2012.
• Use the best available technology.
• Piece together the work of the last ten years in Health IT to build a solution.
4/21/2012 LD DEMONSTRATION PROJECT 24
25. A SOLUTION TO THE
DATA HOARDING PROBLEM
Centralized DATA Warehouse
Or
Distributed in Silos
Answer:
Both – The Hybrid: DATA Remains in the Silos
The Indexes (Inverted Files) Are Centralized
4/21/2012 LD DEMONSTRATION PROJECT 25
26. BASIC ATTRIBUTES OF THE LARGE DATA
DEMONSTRATION PROJECT
• Common Front End
• Established Gatekeeper Requirements (Proprietary)
• Security
• Privacy
• Common Indexing in the Silos
• Use Unstructured Information Management Architecture
and Semantic Search..
• Compile Indexes (Inverted Files Into a Central Index)
• Can extract individual data sets from Silos for analysis into
a Cloud
• Can do cross-patient queries for additional analysis
• Can view and visualize longitudinal medical records
4/21/2012 LD DEMONSTRATION PROJECT 26
27. BASIC ATTRIBUTES OF THE LARGE DATA
DEMONSTRATION PROJECT
• BTRIS MODEL OF NIH
o Biomedical data- Clinical Study Medical Records
o Research data collected using clinical information systems
o Clinical data collected using clinical data integration information systems
o Research data from research information systems
o Reuses of data to support translational research
• Attributes of BTRIS for Our Model
o Common Front End – User access and user interface
o Terminology based queries
o User requirements
o Established Gatekeeper Requirements (Proprietary) Access Policies
• Access Policies
o Policy Working Group
o Security
o Privacy
4/21/2012 LD DEMONSTRATION PROJECT 27
28. BASIC ATTRIBUTES OF THE LARGE DATA
DEMONSTRATION PROJECT (Continued)
• Aggregate disparate data sets – Silos
• Prioritize data sources based upon compliance with relational database
standards
• Common Re-Indexing of Silos
• Use NLM UMLS and standard source terminologies (SNOMED-CT,LOINC,
RxNorm) as implemented by IBM Watson Systems
• Apply the Watson NLP to a mirror of the data in the Silos
• Data Remains in Silos along with the New Index (Inverted File)
• Inverted Files are Compiled and located at a Central Index Hub
• All Queries of the complete DATA-Set are directed to the Central Index
Hub
29. PROJECT DEFINITIONS
• Data – Original health or medical records either flat file
(unstructured text) or a structured data record
including fields of unstructured text
• Index - A relational database inverted file of all terms
and phrases with pointers to the data source for that
terms or phrase
• Silo – Individual Institution or Practioners’ data
warehouse
• User – Person that queries the system or their “bot”
• User Profile – Includes names of systems to which the
user has access to the raw data under its data use
agreement with each Silo
4/21/2012 LD DEMONSTRATION PROJECT 29
30. ASSETS FOR THE DEMO PROJECT
• We have good systems out there to harness for a demonstration
project using available technical capabilities
• A sister of the World’s “tenth” fastest Computer (IBM)
• Low cost data storage capabilities
• Data Warehouse Models
• Standardized Indexing Capabilities thanks to the NLM – UMLS and
standard source terminologies (SNOMED-CT,LOINC, RxNorm)
• A high speed National Internet 2 ???
• Lots of data that meets HL7 etc standards that can be shared
4/21/2012 LD DEMONSTRATION PROJECT 30
31. THINGS WE MAY NEED
• For Purposes of the Demonstration Project and to Share Data
We Need Some Governance Requirements
• We can live within HIPAA through a Federation Model and a
de-identification process
• To achieve some level of uniformity we propose to re-index
all data in the data silos through a single common indexing
platform based upon UMLS and standard source
terminologies (SNOMED-CT,LOINC, RxNorm) which will
uniformly code data (Relational databases and text imbedded
therein (semantic standardization).
• In the project we will develop a future business model to
fund and/or compensate cooperating institutions for their
data investment and overhead and a means to financial
model to sustain a national LHS.
4/21/2012 LD DEMONSTRATION PROJECT 31
32. ONE POSSIBLE MODEL: BTRIS
BTRIS Collects Data From All Over NIH
4/21/2012 LD DEMONSTRATION PROJECT 32
33. Clinical Data at NIH is Collected Into BTRIS
For Use by NIH Authorized Researchers
4/21/2012 LD DEMONSTRATION PROJECT 33
34. WHAT IS IN BTRIS?
4/21/2012 LD DEMONSTRATION PROJECT 34
36. O
n
t
o
l
o
g
y
Data Acquisition Processes
Coding Indexing De-Identifying Permission Setting
BTRIS
BTRIS HAS
Data Repository
Data Retrieval Functions
Authorization Subject-Oriented Cross-Subject Re-Identification NLP
Data Analysis Tools
Subject Recruitment Hypothesis Generation Hypothesis Testing
37. BTRIS*: The NIH Biomedical Translational
Research Information System as Model
• Clinical data repository to collect data from ancillary systems
• Has BTRIS Standards
• Has a Front End with A Gatekeeper System
• Has all Governance Requirements
• Some Aspects of Columbia Presbyterian Systems
o Reorganized for use as Clinical Data Warehouse
o Back end for clinical information systems (CIS, WebCIS, PatCIS, PalmCIS,
QingCIS, MendonÇIS…)
• Built by an NLM Ontology Fellow who participated in the development of the
UMLS
• Coded with the Research Entities Dictionary
• Is available for replication
*The clinical research data repository of the US National Institutes of Health.
Cimino JJ, Ayres EJ. Stud Health Technol Inform. 2010;160(Pt 2):1299-303.
http://people.dbmi.columbia.edu/cimino/Publications/2010%20-%20Medinfo%20-
%20The%20Clinical%20Research%20Data%20Repository%20of%20the%20US%20National%20Institutes
%20of%20Health.pdf
4/21/2012 LD DEMONSTRATION PROJECT 37
38. Advantage of Using BTRIS Model
1. A model compilation and integration of clinical and
research data from multiple disparate sources
2. Understands the authorization issues related to
reuse of patient clinical data
3. Understands the terminology issues related to the
reuse of coded clinical and research data
4. Familiar with the approach being taken at the
National Institutes of Health to collect, integrate, and
code clinical and research data into a single
repository, for authorized reuse in biomedical
research.
4/21/2012 LD DEMONSTRATION PROJECT 38
39. BTRIS ATTRIBUTES
• Multiple Data sources
• Data model integration
• Research Entities Dictionary
• Access policies
• User requirements
• User access and user Interface
• Terminology-based queries
4/21/2012 LD DEMONSTRATION PROJECT 39
• Data sources
• Data model integration
• Research Entities Dictionary
• Access policies
• User requirements
• User access and user Interface
40. Re-using Data in De-Identified Form
• Aggregate and standardize disparate and isolated data sets
• Automate and streamline processes that are traditionally manual and
cumbersome
• Prioritize data sources and functionality based on needs of user community
• Pose hypothetical research questions
• Apply Analytical Tools And Create Reports
• Find unexpected correlations
• Determine potential subject profiles and sample sizes for Clinical Studies
• Find potential collaborators
• Need to extract individual data for analysis
• Need cross-patient queries for additional analysis Data may require
transformation:
o De-identification and Re-identification
o Indexing
o Aggregation by time
o Abstraction by classification
o Conversion to relevant concepts
4/21/2012 LD DEMONSTRATION PROJECT 40
41. THE PROPOSED LD DEMO
This is a research & development project. It will have a
structure similar to omop. It will have four components:
1. Technology – Feasibility
2. Governance
3. Business Models
4. Public Policy Changes
4/21/2012 LD DEMONSTRATION PROJECT 41
42. 1. THE PROPOSED LD DEMO MODEL
Technology – Feasibility
• All Data Remains in Silos
• All Data in Silos are indexed by a single common set of
coding systems based upon the UMLS and standard
source terminologies (SNOMED-CT,LOINC, RxNorm)
• At the Silos the new indexes reside on dedicated
hardware
• A duplicate of the “indexes” at the Silos is transmitted to
the Central Index Repository
• The Data indexes at the Silos are updated 24/7 as new
Data arrives
• The Central Index Repository is simultaneously updated
4/21/2012 LD DEMONSTRATION PROJECT 42
43. PATIENT DATA SOURCES INDEXED IN SILOS, UPDATED
24/7; DUPLICATE INDEX COMPILED AT NETWORK
CENTER, UPDATED 24/7
Pharmaceutical Firms
Clinical Research
& Post Market Data
Integrated
Delivery
System
Community &
Specialty Practice
Health Maintenance Organization
Health DATA Network Index Center
State & Federal Medical
Institutions that
Provide Patient Care
Pharmacy
Lab Tests
44. QUERYING THE SYSTEM THROUGH
THE DATA NETWORK CENTER
4/21/2012 LD DEMONSTRATION PROJECT 44
45. OPERATION OF THE SYSYTEM
• Authorized User Queries the System at the Health Data
Network Center Point of Entry
• The Central Index is polled for all records related to the
question
• A report is generated listing the Silos in which relevant
data is located
• All relevant deidentified data is extracted to a “cloud”
where records can be examined to determine relevancy
• In the cloud the data can analyzed and processed
• Reports can be generated
• The Query is logged with its results: both a list of Silos,
relevant records, and reports generated
4/21/2012 LD DEMONSTRATION PROJECT 45
46. USING IBM SYSTEMS FOR INDEXING
AND QUERYING
• Query: Authorized User Queries the System at the Health Data Network Center
Point of Entry Through a Thesaurus Based Query
• Using Natural Language to ask the question the system will
o Analyze the Question
o Create the list of words and phrases based upon the UMLS UMLS and standard
source terminologies (SNOMED-CT,LOINC, RxNorm) to pose the question to the
system
• The Central Index is polled for all records related to the question
o A report is generated listing the Silos in which relevant data is located
o The system gathers the relevant records
o All relevant deidentified data is extracted to a “cloud” where records can be
examined to determine relevancy
o Records of a single patient may be organized longitudinally
• In the cloud the data can analyzed and processed
• Reports can be generated
• The Query is logged with its results: both a list of Silos, relevant records, and reports
generated
4/21/2012 LD DEMONSTRATION PROJECT 46
47. WHY WATSON TOOLS
• Semantic standardization using LOINC, SNOMED,
rxnorm, etc.
• Watson tools to code questions and data will facilitate
selection of data - patient medical records – for
comparison
• Watson tools generate alternative words, phrases, and
codes for the query and for indexing data - (umia)
enabling semantic analysis
• High speed parallel processor sysytems & software
4/21/2012 LD DEMONSTRATION PROJECT 47
50. HOW WATSON TOOLS WORK FOR
ANALIZING DATA IN MEDICINE
4/21/2012 LD DEMONSTRATION PROJECT 50
51. YOU HAVE TO CONNECT THE DOTS …
but the dots are not cooperating (different
expressions, meaning highly dependent on context)
4/21/2012 LD DEMONSTRATION PROJECT 51
52. NEJM Medical Concept Annotations
4/21/2012 Draft 52
Medications
SymptomsDiseases
Modifiers/nOPQRST
Annotation using Metamap, DeepQA annotators
53. MAPPING FROM LANGUAGE TO MEDICAL CONCEPTS
4/21/2012 LD DEMONSTRATION PROJECT 53
[ C0020538 ] Hypertensive diseaseblood pressure was 140/100 mm Hg
Systolic blood pressure 84 mm Hg
[C0043352] Xerostomia
fever
dry mouth
thirst
[C0015967] Fever
[C0039971] Thirst
Mapping to Canonical Forms
Fast heart
rate
[C0039231] Tachycardia (condition)
[C2029900] Fast heart rate (symptom)
diabetes
[C0011849] Diabetes Mellitus
[C0011860] Diabetes Mellitus, Non Insulin-Dependent
[C0011847] Diabetes
Disambiguation
monomorphic wide-
complex tachycardia
???
Representational Complexity
decreased saliva
Converting from measurements
requires background knowledge
of ranges, patient demographics
etc.
Heart rate of 240 bpm [C0039231] Tachycardia
[C2029900] Fast heart rate(symptom)
[ C0232105 ] Normal blood pressure
56. EFFECTIVENESS RESEARCH VALUE
PROPOSITION
Challenges
• Medical Record Information is in Silos
o Patient data in medical records is extensive and difficult
and time consuming to extract information
• Standardization
o Silos’ data in not uniformly structure (Some are and some not)
• Governance Issues
o Privacy
o Data access rules of Silos
• Business Model
o Compensation for data use
o Sustainability of system
4/21/2012 LD DEMONSTRATION PROJECT 56
Watson’s Value In Retrieving Relevant Records For
Analysis
• Common indexing of source data (medical record)
• Structuring and reasoning over natural language content to
form the query
• Generating relevant records for analysis
o Affording drill-down into each dimension to explore
evidence
57. THREE OTHER AREAS TO BE
EXPLORED IN THE STUDY
2. Governance
3. Business Models
4. Public Policy Changes
4/21/2012 LD DEMONSTRATION PROJECT 57
58. 2. GOVERNANCE
• Federation of some type
• Participation of Stakeholders
• Data Providers Data Use Agreements
• Access to System
• Publication of Reports (with data)
• De-Identification
• Patient Consent for Research
• Use of identifiable information for clinical study recruiting
• Standards
• Linking in other health related data (Birth, Death,
Prescription and other databases to add more content)
4/21/2012 LD DEMONSTRATION PROJECT 58
59. 3. BUSINESS MODELS
• Compensation of Participants (Data Partners)
o Payment for structuring data
o Payment for use of data
• Sustainability of the Overall System
o The National System Must Be Sustainable
o PCORI Model for funding
• Incentives to overcome data hoarding
4/21/2012 LD DEMONSTRATION PROJECT 59
60. 4. PUBLIC POLICY CHANGES
• Laws and Regulations
oPatient Rights In Medical Records
oSharing of Publically Funded Data
oPortable Consent
• Budget Priorities and Reallocation
oInvest $1.0 bil. of the $30.0 bil. now in LHS
4/21/2012 LD DEMONSTRATION PROJECT 60
61. DELIVERABLES
FIRST YEAR
• Budget Prioritization Recommendations for LHS
• Operational Model System In Place
• Business Case Options
• New Incentives
• Review and Recommendations for change in legislation
and regulation State & Federal)
4/21/2012 LD DEMONSTRATION PROJECT 61
62. NEXT STEPS
• Adopt LHS Principles
• Use Non-Profit Framework for LD Project
• Convene Healthcare Funders & Stakeholders
• Prepare and Approve Major Grant Application Draft
• Set Up Organization Similar to OMOP with Stakeholder Groups
• Sign Up Supporters
• Sign Up Data Participants
• Sign Up Other Resource Contributors
• Apply for Major Grant
• Implement Grant Application
• Convene Stakeholder Groups to Develop Governance Procedures
and Business Model for a Self Sustained System
4/21/2012 LD DEMONSTRATION PROJECT 62