2. Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
3. Care Home Admissions
• Undesirable
• Costly
• Recorded in routine data
• Potentially avoidable
4. Upstream Interventions
• There is robust evidence that certain
preventative interventions are
effective at avoiding or delaying care
home admission
• But they are only be cost-effective if
they are offered to people truly at
high risk
5. Predictive Factors
• Many factors are known to be predictive of
care home admission
• Several face-to-face tools have been built
using these factors
6. Factors statistically predictive of
institutionalisation
Predictive Factor (Institutionalization) Number of Studies
Age
Dementia / Cognitive impairment
ADL restriction
Number of family members
Use of day services
Incontinence
Co-morbidity
Sickness
Severe Disability
Malignancy
Consulting doctors at general hospitals
Temporary nursing home assistance
Housing conditions
Marital status
Walking ability
Night delirium
Mental disorientation
Age of primary caregiver
Living alone
Number of sub-caregivers
Number of rooms in house
Home ownership
Use of home help
Self-perceived health
7. Health Needs Social Care Needs
• Diagnoses • Client group
• Prescriptions • Disabilities
• Record of Health • Record of care
Contacts history
PAST Predictive
Model
FUTURE
Health Service Use Social Care Use
• GP visits • Residential care
• Community care • Intensive home
• Hospital care care
• Direct payments
8. Predictions based on
routine data
• Less labour intensive so they can stratify the
population systematically and repeatedly
• Avoid “non-response bias”
• Can identify people with lower, emerging, risk
9. Potential Drawbacks
• Important issues of confidentiality and consent to consider
• Linking data sources at individual level across health and
social care is problematic where there is no NHS number in
social care
• The tools are never 100% accurate
• Data may be missing from routine databases on certain
groups
10. Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
11. Data protection
Before predictive modelling can work, we need to
reconcile the following:-
1. Predictive modelling believed to be very valuable in
improving patient care
2. But at the same time we need to protect patient
confidentiality and process data appropriately
12. Is it possible to obtain
consent from
individuals prior to
predictive modelling?
Not feasible given numbers of patients involved
and:
“it has become clear that it is not appropriate to
seek patient consent as not everyone whose
data is analysed will be offered the new service.”
Source: Patient Information Advisory Group
13. Legal safeguards for
health data
1. The principles of common law on informed
consent and patient confidentiality
2. The Data Protection Act 1998, which requires
appropriate data handling
3. The Human Rights Act 1998, which is concerned
with the invasion of privacy
4. Also, the Caldicott principles in the NHS
14. Personal data
According to DPA 1998:
Personal data means data which relate to a living
individual who can be identified –
(a) from those data, or
(b) from those data and other information which
is in the possession of, or is likely to come into
the possession of, the data controller
Personal data relating to a person’s “physical or
mental health or condition” is sensitive personal
data.
15. DPA 1998 requirements
for processing of
sensitive personal data
At least one of the following:
1. Processing with explicit consent of the data subject
2. Processing necessary to protect the vital interests of the data
subject or another person, where it is not possible to get
consent
3. Processing necessary for the purpose of, or in connection
with, legal proceedings (including prospective legal
proceedings), etc.
4. The processing is necessary for medical purposes and is
undertaken by a health professional or a person owing a duty
of confidentiality equivalent to that owed by a health
professional
Medical purposes is defined in the Act to include preventative
medicine, medical diagnosis, medical research, the provision of care
and treatment, and the management of healthcare services.
16. Alternatives (1): s60 powers
Section 60 of the Health and Social Care Act 2001 (later s251 of the
National Health Service Act 2006):
Introduced to allow the regulated use of information by organisations
wishing to obtain patient identifiable data [a similar concept to sensitive
personal data], for medical purposes, where it was impracticable to obtain
informed consent
Applies in England and Wales
Disclosure of information on the basis of an Order made under s60 cannot
be legitimately accused of involving breaches of confidence (source:
Information Commissioner)
PIAG (later ECC) set up to advise the Secretary of State on the use of
powers provided by s60
17. Name, Address, DOB 131178 J7KA42
Encrypted, lin
ked data
Inpatient
Name, Address, DOB 131178 J7KA42
Outpatient
J7KA42
A&E
Name, Address, DOB 131178 J7KA42
GP
Name, Address, DOB 131178 J7KA42
J7KA42 76.4
131178 76.4
Decrypted data
with risk score
attached
19. Is pseudonymised data
“personal data”?
According to DPA 1998:
Personal data means data which relate to a living
individual who can be identified –
(a) from those data, or
(b) from those data and other information which
is in the possession of, or is likely to come into
the possession of, the data controller
Personal data relating to a person’s “physical or
mental health or condition” is sensitive personal
data.
20. Pseudonymisation and
the data protection act
“Retraceably pseudonymised data may be considered as
information on individuals which are indirectly identifiable
… In that case, although data protection rules apply, the
risks at stake for the individuals with regard to the
processing of such indirectly identifiable information will
most often be low, so that the application of these rules
will justifiably be more flexible than if information on
directly identifiable individuals were processed.”
Source: Article 29 Working Party. Opinion 4/2007 on the concept of personal
data, adopted on 20th June
21. Solution agreed …
Process to undertake the analysis will include with it an encryption
programme
Programme will be run by people not directly involved in providing care
and treatment – but these people will not access the identifiable data held
within the data file
The output files will be sent encrypted to the practice or other clinicians
already providing care and treatment to the patients concerned
The decryption keys will be held by the PCT and will be sent separately to
the health professionals involved
“It is a clear principle of the Patient Advisory Group that the first point of
contact with patients should be made through a clinical team known to the
patient, such as their GP practice.”
Source: PIAG (2008)
22. Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
23. Data collected
• From five sites (~ PCT/LA areas in England)
• Total nine organisations: 4 PCTs, 4 LAs, 1 Care trust
• 1.8M population (range 100,000-700,000)
Years (up to) No. records No. people
GP register 5 7,861,000 1,951,000
GP consultations 5+ 110,971,000 589,000
Inpatient (SUS) 5 3,268,000 999,000
Outpatient (SUS) 5 12,815,000 1,532,000
A&E (SUS) 5 2,127,000 925,000
Social care clients 3+ 81,000 81,000
Social care assessments 3+ 194,000 72,000
Social care services 3+ 326,000 79,000
Community 1,316,000
24. Data linkage - approach
First instance: NHS number (encrypted) from LA
In absence of NHS number:
– Central ‘batch tracing’ ?? Forename
Male / Female
– Shared PCT/LA databases ??
FSGDDMMYYYY
Ultimately:
– construction of ‘alternative IDs’ Surname
DOB
97% of individuals in one site (population ~400,000) were
found to have unique ‘alternative ID’.
Remaining 3% - attempt match by postcode
25. Data linkage - Summary
Male / Female
NHS number where available Forename
(encrypted) FSGDDMMYYYY
‘Alternative ID’ (+ postcode)
where not (both encrypted) Surname
DOB
Linkage method
NHS number provided for all social care clients.
Site A
Match takes place through encrypted NHS number.
NHS number provided for 89% of social care clients.
Site B
Match via encrypted NHS number.
NHS numbers given for 86% of clients.
Site C Match occurs by NHS number in the first instance, and then through the
‘alternative ID’ .
Sites No NHS numbers provided for social care clients.
D&E Match takes place via ‘alternative ID’.
26. Data linkage – how good?
Groups of people in social care data – how many are we able to
match to GP register list (of ages 55+)?
Varies, but better for those with > service use
N matched to
N over 55 GP register % match
SITE A (100% NHS no)
People assessed 36,166 30,508 84%
service received 24,036 19,250 80%
‘significant new’ service 2,106 2,034 97%
SITE D (100% ‘alt id’)
People assessed 18,327 11,512 63%
service received 7,593 5,772 76%
‘significant new’ service 273 252 92%
27. Data linkage
Social & Hospital care overlap
Population of over 55s registered
in one PCT
90% of those with a social
care contact have also had
secondary care contact(s)
in three years
35. Inpatient
Outpatient
Using the Model
A&E
GP
A89KP5 A89KP5
833TY6 833TY6
I9QA44 I9QA44
85H3D 85H3D
6445JX 6445JX
233UMB 233UMB
RF02UH RF02UH
Last Year This Year Next Year
36. Modelling results
Predicting for over 75s
– admission to care home / intensive home care
– marked increase in social care costs (+£5,000)
No. people in
Number of these,
area who do
predicted by how many are PPV Sensitivity
experience the
model correct?
'event'
Site A 267 105 39% 2,204 5%
Site B 180 85 47% 497 17%
Site C 47 21 45% 220 10%
Site D ~20-40 * ~70-30% * 256 ~8-16 % *
Site E 119 67 56% 604 11%
Pooled (all sites) 557 201 36% 3,366 6%
* stable model not found
37. Changing the Dependent Variable
Predicting for over 75s
– admission to care home / intensive home care
– some increase in social care costs
Predict No Predict Yes
Actual No Actual Yes Actual No Actual Yes
PPV Sensitivity Specificity
FALSE
TRUE NEG FALSE POS TRUE POS
NEG
Pooled Model
152,183 3,165 356 201 36% 6% 99.8%
£5K
Pooled £3K 151,245 3,660 564 436 44% 11% 99.6%
Pooled £1K 149,278 4,677 876 1,074 55% 19% 99.4%
Pooled £1 ! 143,598 8,154 1,559 2,594 62% 24% 98.9%
38. Important model variables?
Beta
Variable coefficients Probability
Intercept -4.96 <.0001
Age band 8 (90+) (relative to 75-79) 1 <.0001
Age band 7 (85-89) (relative to 75-79) 0.87 <.0001
Age & Sex
Age band 6 (80-84) (relative to 75-79) 0.47 <.0001
Sex = female 0.36 <.0001
Any medium intensity home care year in past year 2.35 <.0001
Social Care data flag for health problem 2.14 <.0001
Any social care assessments recorded in past year 1.43 <.0001
Any low intensity home care year in past year 1.14 <.0001
Social care Any day care in period 2-1 years prior 1.09 <.0001
Prior Use Any social care assessments recorded in period two – one years
prior 0.59 <.0001
Any meals supplied in period (2-1) year prior 0.33 0.02
No. of social care assessments in last year -0.14 0.03
Any medium intensity home care year in period 2-1 year prior -1.22 <.0001
OP visit in past two years: specialty Old Age Psychiatry 0.4 0.01
Any inpatient diagnosis: COPD (previous 2 years) 0.39 0
Any inpatient diagnosis: diabetes (previous 2 years) 0.39 0
Health Care No of emergency admissions in past 90 days 0.29 <.0001
Any A&E visit arriving by ambulance in the past year 0.25 <.0001
Ratio of inpatient episodes to admissions in past year 0.16 <.0001
Number different OP specialties seen in prior two years the importance of prior social
Note 0.07 <.0001
care variables
39. Impact of adding new datasets
Predict No Predict Yes
Actual
Actual No Actual Yes Actual No Yes PPV Sensitivity Specificity
TRUE
TRUE NEG FALSE NEG FALSE POS POS
Site D - £1K best 22,538 556 49 46 48.4% 7.6% 99.8%
+ IP and GP
22,538 558 49 44 47.3% 7.3% 99.8%
diagnostic vars
+ GP vars 22,539 561 48 41 46.1% 6.8% 99.8%
+ Community care 22,534 557 53 45 45.9% 7.5% 99.8%
+ Deprivation vars 22,539 562 48 40 45.5% 6.6% 99.8%
40.
41.
42.
43. Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
45. Trend
Model
Cost
predicts:
Details Model predicts
which patients
will become
high-cost over
next 6 or 12
months
Examples Low-cost
patient this
year will
become high-
cost next year
46. Trend
Model
Cost Event
predicts:
Details Model predicts Model predicts
which patients which patients
will become will have an
high-cost over event that can
next 6 or 12 be avoided
months
Examples Low-cost Patient will be
patient this hospitalized
year will
become high- Patient will
cost next year have diabetic
ketoacidosis
47. Trend
Model
Cost Event Actionability
predicts:
Details Model predicts Model predicts Model predicts
which patients which patients which patients
will become will have an have features
high-cost over event that can that can readily
next 6 or 12 be avoided be changed
months
Examples Low-cost Patient will be Patient has
patient this hospitalized angina but is
year will not taking
become high- Patient will aspirin
cost next year have diabetic Patient does
ketoacidosis not have
pancreatic
cancer
(Ambulatory
Care Sensitive)
48. Trend
Model
Cost Event Actionability Readiness to
predicts: engage
Details Model predicts Model predicts Model predicts Model predicts
which patients which patients which patients which patients
will become will have an have features are most likely
high-cost over event that can that can readily to engage in
next 6 or 12 be avoided be changed upstream care
months
Examples Low-cost Patient will be Patient has Patient does
patient this hospitalized angina but is not abuse
year will not taking alcohol
become high- Patient will aspirin
cost next year have diabetic Patient does Patient has no
ketoacidosis not have mental illness
pancreatic
cancer
(Ambulatory Patient
Care Sensitive) previously
compliant
49. Trend
Model
Cost Event Actionability Readiness to Receptivity
predicts: engage
Details Model predicts Model predicts Model predicts Model predicts Model predicts
which patients which patients which patients which patients what mode and
will become will have an have features are most likely form of
high-cost over event that can that can readily to engage in intervention will
next 6 or 12 be avoided be changed upstream care be most
months successful for
each patient
Examples Low-cost Patient will be Patient has Patient does Patient prefers
patient this hospitalized angina but is not abuse email rather
year will not taking alcohol than telephone
become high- Patient will aspirin
cost next year have diabetic Patient does Patient has no Patient prefers
ketoacidosis not have mental illness male voice
pancreatic rather than
cancer female
(Ambulatory Patient
Care Sensitive) previously
compliant Readiness to
change
50. Outline
• Background
• Information Governance
• Data Linkage
• Modelling Social Care
• Predicting Impactability
• Service Evaluation
51. The problem of regression to the mean
in service evaluation
Average number of emergency bed days 50
45
40
35
30
25
20
15
10
5
0
-5 -4 -3 -2 -1 Intense +1 +2 +3 +4
year
53. Participating sites
Information Centre
IC collates and adds (if
required) NHS
Owner of
Sites collate patient lists numbers using batch
tracing pseudonymisation
password (DH)
IC derives
extra
identifiers Nuffield Trust
KEY
Patient identifiers Trial information (e.g. Non-patient identifiable keys (e.g.
(e.g. NHS number) start and end date) HES ID, pseudonymised NHS
number)
54. Overcoming regression to the mean
using a control group (1)
Intervention
0.3
Number of emergency hospital admissions
Start of intervention
per head per month
0.2
0.1
0.0
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12
Month
55. Overcoming regression to the mean
using a control group (2)
Control Intervention
0.3
Number of emergency hospital admissions
Start of intervention
per head per month
0.2
0.1
0.0
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11 12
Month