This document discusses methods for compiling death data from multiple sources at Kaiser Permanente sites. It describes variables like SOURCE and CONFIDENCE used to track the origin and reliability of death record information. Sites obtain data from sources like membership records, hospital discharges, death certificates, and the Social Security Administration. They then develop various processes to select the most reliable data to populate virtual data warehouses. The document recommends standardizing how confidence levels are assigned to improve consistency in evaluating the likelihood of death, accuracy of death dates, and correctness of linked cause of death information.
A Simulated Diabetes Learning Intervention Improves Provider Knowledge and Co...
Weaving Death Data Sources for Accuracy
1. David Eastman, KP CHR Southeast, Atlanta, GA
Don Bachman, MS, KP CHR, Portland, OR
Daniel Ng, BSE, MBA, KP DOR, Oakland, CA
Wei Tao, MS, KP DOR, Oakland, CA
2. Topics
Background
Survey of death data sources
The SOURCE variable
Methods of weaving death data together
The CONFIDENCE variable
Inter-source agreement analysis at KPGA
VDW death data QA program preliminary findings
Sprinkled throughout
3. Background
VDW death files contain:
Dates of death
Qualifiers - data source, confidence, date
imputation flag
Causes of death
Typically VDW sites have access to multiple
sources of death data
How data are woven together varies considerably
4. Death Data Sources
HMO Membership
Clarity Patient table
Common membership
Hospital Discharges
State Death Certificates
Social Security Administration
National Death Index
Tumor Data
Clarity “Death Notes”
5. HMO Death Data: Pros and Cons
Pros
No probabilistic matching; unlikely to be the wrong person
Gold standard at some HMOs
Cons
No cause of death information
No inactive (prior) member deaths; death after disenrollment
will probably be missed
At some HMOs, family/employer must notify HMO; less
rigorously reported & dates may be inaccurate.
At other sites, hospital, home health and hospice care are well
integrated in the EMR and provide very reliable death dates.
At some sites, this method is more prone to false negatives
than Gov’t data
6. Gov’t Death Data: Pros and Cons
Pros
HMO enrollment status at time of death is irrelevant; death after
disenrollment more likely to be captured if it is part of the matching
algorithm
Some gov’t sources contain cause of death information
Cons
Probabalistic matching on names/dates/SSN/etc.; wrong person may
get matched. Some sites cannot match on SSN which makes the
method less reliable.
Some sites do the matching themselves, some only get matches from
the gov’t
May be more far reaching than HMO data, but may not include
deaths outside of HMO’s state(s)
At some sites, this method is more prone to false positives than HMO
data
7. The SOURCE Variable
Spec definition: Source of death data?
Spec values:
S = State Death files
N = National Death Index
T = Tumor data
Others are locally defined
Based on preliminary QA results from 7 sites:
5 sites use the State Death files (S)
1 site uses National Death Index data (N)
2 sites use the Tumor data (T)
7 sites include “other” local codes
8. Methods of Weaving Death Data
Together
Descriptions of methods used at:
KPGA
KPNC
KPNW
9. KPGA Method - Step 1
Merge all possible death data into a research data
warehouse table
10. KPGA Method – Step 2
Select the “best quality” data to populate the VDW
HMO sources favored
(vs. Gov’t sources)
Confidence variable:
source agreement &
postmortem activity
11. KPNC Method
1. Input Pre-Processing
Combine member records containing demographic variables, contact dates,
and membership dates
2. QualityStage matching
Probabilistic matching of KPNC members to CA state and SSA death records
3. Initial Filtering
Filter large number of match output records down to manageable size
Resulting files (KPNC-CA and KPNC-SSA matches) have multiple matches
per MRN
4. Ranking & Selection
Select the single, best match per MRN based on weighted comparison of
match linkweights, demographic vars, and contact and membership dates
5. Assign Final Variables
Select best Death date
Assign scores for overall confidence and confidence of CA and SSA matches
12. KPNW Method - Part 1
Internal KP data: only use reliable sources
1. Patient table from Clarity. Most reliable & best source of death
dates based on internal validation and subsequent CESR QA.
2. Common Membership including a specific death table (older
sources don’t include death dates, but do correctly identify dead
patients)
3. KPNW tumor registry
4. Probabilistic match of KP members to OR and WA state data by
CHR Staff (unlike other many other sites).
OR & WA state don’t do the matching and won’t share SSNs.
CHR staff match members from the past 2 years to the state
data. Only current source of cause of death. 18-36 month
lag.
13. KPNW Method - Part 2
Been creating death files for several years
Death files only include those who we believe have
truly died
Death dates from KP internal data appear very
reliable based on CESR QA
Death dates from the Tumor Registry and state data
are also excellent but not as good as internal KP data
Death more than 2 years after disenrollment will
probably be missed with current system
Would benefit from switching to a common HMORN
confidence variable algorithm
14. The CONFIDENCE Variable
Spec definition: “How you rate the accuracy of the
observation based on source, match, # of reporting sources,
discrepancies, etc.”
Spec values: E=Excellent, F=Fair, P=Poor
Based on preliminary QA results from 7 sites,
by site:
% E ranges from 20% to 100%
% F ranges from 0% to 55%
% P ranges from 0% to 50%
% E + %F ranges from 50% to 100%
The CONFIDENCE variable is inconsistently implemented!
15. The CONFIDENCE Variable
What does the confidence variable measure?
Likelihood of death?
Accuracy of the death date?
Likelihood that the cause of death information
is linked to the correct person?
16. Inter-source Agreement Analysis at
KPGA
Where do data come from?
Corroborated deaths
Inter-source death date agreement
Postmortem activity
Confidence distribution
22. Recommendations
Create new confidence variables
Confidence that the patient is really dead
Confidence in the death date
Confidence in the linkage to external source data
KPNC has implemented these as local variables
Develop a common algorithm to determine the
values of these confidence variables to give them a
common meaning.