This document discusses three different methods for creating a complex ADaM (Analysis Data Model) dataset from SDTM (Study Data Tabulation Model) datasets. The first method involves transforming SDTM datasets directly into ADaM datasets. The second method uses intermediate permanent datasets between the SDTM and final ADaM datasets. The third method uses intermediate ADaM datasets between the SDTM and final ADaM datasets. An example complex ADaM dataset measuring average daily drinking rate is provided to illustrate each of the three methods. Key components and algorithms for deriving parameters in the example are also described.
3. Agenda
• Introduction of ADaM dataset
• Three methods for a complex ADaM dataset
• Example
• Benefits of each method
• Limitation of each method
• Consideration
• Conclusion
• Questions & Answers
11/27/2013
Cytel Inc.
3
4. Introduction of ADaM
• ADaM(Analysis Data Model) is the analysis dataset in
CDISC.
• Purpose
• Analysis Ready (statistical analysis to be performed with
minimal programming)
• Traceability
• Type
• ADSL(Subject Level Analysis Dataset)
• BDS(Basic Data Structure)
− Special BDS(upcoming)
• ADTTE(Time to Event Analysis Dataset)
• ADAE(Adverse Event Analysis Dataset ‐ upcoming)
4
5. A complex ADaM dataset
• Can require several algorithms
• Can require several data manipulation steps
• Can be derived from more than one SDTM
• Can be difficult to trace back
• Can be difficult to validate
11/27/2013
Cytel Inc.
5
6. Three Methods to create a complex
ADaM dataset
1. SDTM datasets to ADaM datasets
2. SDTM datasets through the intermediate
permanent datasets to final ADaM datasets
3. SDTM datasets through the intermediate
ADaM datasets to final ADaM datasets
11/27/2013
Cytel Inc.
6
8. Example 1
• A comparison of average daily drinking rate in
treatment period between placebo and study
drug.
• At the baseline period ‐ the average daily drinking
rate during 21 days from hospitalization date
• At the treatment period – the average daily
drinking rate during during 42 days from the first
study dose.
Baseline rate imputation applied to the followings
− The subject who discontinued early
− Any missing assessment
11/27/2013
Cytel Inc.
8
9. Key components in the example
• SDTM – SU (Substance Use)
• Final ADaM – ADDR (Drinking Rate Analysis
Dataset)
• Parameter – ADDRATE (Average Daily Drinking
Rate)
11/27/2013
Cytel Inc.
9
10. Algorithm of parameter of ADDRATE
• Rb (Baseline rate) = sum of all doses / number
of days drinking data available at baseline
period
• Ra (Actual treatment rate) = sum of all doses /
number of days drinking data available at
treatment period
• Rt (Imputed treatment rate)
( Ra * DAYS + Rb * (42 – DAYS) ) / 42
at DAYS is the number of days drinking data
available
11/27/2013
Cytel Inc.
10
11. Three Methods for example
Intermediate permanent datasets
SDTM+(_SU)
ADaM+(_ADDR)
SDTM(SU)
ADaM(ADDR)
ADaM(ADSU)
11/27/2013
Cytel Inc.
11
13. Analysis Dataset Metadata for ADDR
Dataset
Name
Dataset
Description
Dataset
Location
Dataset
Structure
ADDR
Drinking
Rate
Analysis
Data
addr.xpt
one record per USUBJID,
PARAMCD,
subject per
parameter per AVISITN
analysis visit
11/27/2013
Cytel Inc.
Key
Variables
of Dataset
Class of
Dataset
Documentation
BDS
c‐addr.txt
13
14. Analysis Variable Metadata including Analysis
Parameter value level Metadata for ADDR (1)
Variable Label
Variable
Type
Display
Format
ADDR
*ALL*
USUBJID
Unique Subject
Identifier
text
$20
ADSL.USUBJID
ADDR
*ALL*
SITEID
Site ID
text
$20
ADSL.SITEID
ADDR
*ALL*
SEX
Sex
text
$20
M, F
ADSL.SEX
ADDR
*ALL*
FASFL
Full Analysis Set
Population Flag
text
$1
Y, N
ADSL.FASFL
ADDR
*ALL*
TRTPN
Planned
Treatment (N)
integer
1.0
1 = Placebo, 2
= Study Drug
ADSL.TRTPN
ADDR
*ALL*
TRTP
Planned
Treatment
text
$20
Placebo,
Study Drug
ADSL.TRTP
ADDR
PARAMCD
PARAMCD
Parameter Code
text
$8
ADDRATE
ADDR
*ALL*
PARAM
Parameter
text
$50
Average Daily
Drinking Rate
11/27/2013
Cytel Inc.
Codelist /
Controlled
Terms
Source /
Derivation
Dataset Parameter Variable
Name
Identifier Name
14
15. Analysis Variable Metadata including Analysis
Parameter value level Metadata for ADDR (2)
Dataset Parameter Variable
Name
Identifier Name
Variable
Label
Variable
Type
Display
Format
Codelist /
Controlled
Terms
ADDR
*ALL*
PARAMTYP
Parameter
Type
text
$20
DERIVED
ADDR
*ALL*
AVISITN
Analysis Visit integer
(N)
3.0
1=Baseline,
2=Treatment
Period
ADDR
*ALL*
AVISIT
Analysis Visit
text
$20
Baseline,
Treatment
Period
ADDR
*ALL*
AVAL
Analysis
Value
float
8.2
Source / Derivation
11/27/2013
Cytel Inc.
‘Baseline’ when
SU.VISIT=‘Screening’
‘Treatment Period’
when SU.VISIT in (‘VISIT
1’, ‘VISIT 2’, ‘VISIT 3’)
Average Daily Drinking
Rate within analysis
visit. At Treatment
Period, if a patient
discontinues early or
have missing records,
impute with baseline
rate
15
16. Analysis Variable Metadata including Analysis
Parameter value level Metadata for ADDR (3)
Dataset Parameter Variable
Name
Identifier Name
Variable
Label
Variable
Type
Display
Format
Codelist /
Controlled
Terms
Source / Derivation
ADDR
*ALL*
ABLFL
Baseline
Record Flag
text
$1
Y
‘Y’ at AVISIT = “Baseline”
ADDR
*ALL*
BASE
Baseline
Value
float
8.2
AVAL of
AVISIT=“Baseline”
ADDR
*ALL*
CHG
Change from float
Baseline
8.2
AVAL ‐ BASE
11/27/2013
Cytel Inc.
16
17. 1st method : SDTM to ADaM
SDTM(SU)
11/27/2013
ADaM(ADDR)
Cytel Inc.
17
18. Final ADaM dataset of ADDR
USUBJID
FASFL
TRTP
PARAMCD PARAM
AVISIT
ABLFL
AVAL
001‐01‐001
Y
Study
Drug
ADDRATE
Average Daily
Drinking Rate
Baseline
Y
4.40
001‐01‐001
Y
Study
Drug
ADDRATE
Average Daily
Drinking Rate
Treatment
Period
001‐01‐002
Y
Placebo
ADDRATE
Average Daily
Drinking Rate
Baseline
001‐01‐002
Y
Placebo
ADDRATE
Average Daily
Drinking Rate
Treatment
Period
2.72
Y
BASE
CHG
4.40
‐1.68
4.26
‐1.16
4.26
3.10
Key points to note:
• Row 2: There are 3 missing assessments during the
treatment period for the subject of 01‐001, so the baseline rate
imputation method was applied as follow
2.60*39 + 4.40*(42‐39) = 2.72
42
• Row 4: There are no missing assessments during the
treatment period for the subject of 01‐002
11/27/2013
Cytel Inc.
18
19. 2nd method : SDTM to intermediate
permanent datasets to ADaM
Intermediate permanent datasets
SDTM+(_SU)
ADaM+(_ADSU)
SDTM(SU)
11/27/2013
ADaM(ADDR)
Cytel Inc.
19
21. Intermediate permanent datasets of
SDTM plus _SU (2)
• _HOSEQ is the sequence number of non‐
missing drinking assessment from the
hospitalization date (2011‐02‐08)
• _SDSEQ is the sequence number of non‐
missing drinking assessment from the first
dose date (2011‐03‐01)
• When SUSTAT = ‘NOT DONE’, _HOSEQ and
_SDSEQ are not increased by 1.
11/27/2013
Cytel Inc.
21
22. Intermediate permanent dataset – ADaM
plus _ADDR (1)
USUBJID TRTP
PARAM
AVISIT
ABLFL
AVAL
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Baseline
Y
4.40
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Treatment
Period
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Baseline
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Treatment
Period
2.72
Y
BASE
4.26
3.10
4.26
‐1.16
_DAYS
_AVAL
19
4.40
101.2
39
2.60
89.4
‐1.68
_TOT
AL
83.6
4.40
CHG
21
4.26
130.2
42
3.10
Plus variables
• _TOTAL(Sum of doses per visit) = sum(SUDOSE)
• _DAYS (Number of non‐missing drinking days per visit)=
count(missing SUSTAT) or last._HOSEQ or last._SDSEQ within
AVISIT
• _AVAL (Actual treatment rate)= _TOTAL / _DAYS
11/27/2013
Cytel Inc.
22
23. Intermediate permanent dataset – ADaM
plus _ADDR (3)
USUBJID TRTP
PARAM
AVISIT
ABLFL
AVAL
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Baseline
Y
4.40
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Treatment
Period
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Baseline
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Treatment
Period
2.72
Y
BASE
4.26
3.10
4.26
‐1.16
_DAYS
_AVAL
19
4.40
101.2
39
2.60
89.4
‐1.68
_TOTAL
83.6
4.40
CHG
21
4.26
130.2
42
3.10
Key points to note:
• Row 2 and 4: at the treatment period, AVAL algorithm is
(_AVAL * _DAYS + BASE * (42 ‐ _DAYS) ) / 42
• Row 2:
2.60*39 + 4.40*(42‐39) = 2.72
42
• Row 4:
3.10*42 + 4.26*(42‐42) = 3.10
11/27/2013
Cytel Inc.
42
23
24. 3rd method: SDTM to intermediate ADaM
to ADaM
SDTM(SU)
ADaM(ADDR)
ADaM(ADSU)
11/27/2013
Cytel Inc.
24
26. Intermediate ADaM dataset of ADSU (2)
• ‘NOT DONE’ data from SU were not included in
ADSU
• At baseline visit, we only include 19 records for 01‐
001. We used DYPTE=’AVERAGE’ to achieve the
average of assessed doses at ASEQ = 20.
• At treatment period visit, we only include 39 records.
We used DYPTE=’AVERAGE’ to achieve the average of
assessed doses at ASEQ = 63.
11/27/2013
Cytel Inc.
26
27. Final ADaM dataset of ADDR
USUBJID TRTP
PARAM
AVISIT
ABLFL
AVAL
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Baseline
Y
4.40
001‐01‐
001
Study
Drug
Average Daily
Drinking Rate
Treatment
Period
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Baseline
001‐01‐
002
Placebo
Average Daily
Drinking Rate
Treatment
Period
2.72
Y
BASE
4.26
3.10
4.26
‐1.16
SRCSEQ
20
ADSU
63
ADSU
‐1.68
SRCDOM
ADSU
4.40
CHG
22
ADSU
65
Key points to note:
• All the records are coming from ADSU.
• Great data point traceability.
11/27/2013
Cytel Inc.
27
28. Example 2 : Intermediate Time to Event
permanent ADaM plus dataset
USUB
JID
TRTP
PARA AVA
M
L
STAR
TDT
ADT
CN
SR
EVNTDESC
_DSDECOD
_DS
DTC
_SVXS
TDTC
_AEX
DT
001‐
01‐001
Study
Drug 1
Death
157
2011‐
01‐04
2011‐
06‐10
1
COMPLETED
THE STUDY
COMPLETED
THE STUDY
2011‐
06‐10
2011‐
06‐10
2011‐
05‐04
001‐
01‐002
Study
Drug 2
Death
116
2011‐
02‐01
2011‐
05‐28
1
LOST TO
FOLLOW‐UP
LOST TO
FOLLOW‐UP
2011‐
05‐28
2011‐
05‐28
2011‐
05‐01
001‐
01‐003
Study
Drug 2
Death
88
2011‐
02‐05
2011‐
05‐04
0
DEATH
DEATH
2011‐
05‐04
2011‐
05‐04
2011‐
05‐04
001‐
01‐004
Study
Drug 1
Death
102
2011‐
03‐20
2011‐
06‐30
1
ONGOING
2011‐
06‐30
2011‐
06‐04
001‐
01‐005
Study
Drug 1
Death
101
2011‐
03‐26
2011‐
07‐05
1
ONGOING
2011‐
07‐01
2011‐
07‐05
AVAL = ADT – STARTDT
Plus variables
• _DSDECOD = DS.DSDECOD when DS.DSCAT = “DISPOSITION EVENT”
• _DSDTC = DS.DSDTC when DS.DSCAT = “DISPOSITION EVENT”
• _SVXSTDTC = Last Study Visit date
• _AEXDT = Last AE date
11/27/2013
Cytel Inc.
28
29. 1st Method : SDTM to ADaM
The benefits are
• Simple process
The limitations are
• A lack of data point traceability (Traceability
will be provided with Define.xml)
• Difficult to troubleshoot issues if development
SAS programmer and validation SAS
programmer do not agree on issues in the
final ADaM dataset.
11/27/2013
Cytel Inc.
29
30. 2nd Method : SDTM thru intermediate
permanent datasets to final ADaM
The benefits are
• Easy to follow each step and to validate
• Flexibility of the data structure of
intermediate datasets (A programmer does
not need to follow CDISC standards in the
intermediate permanent datasets)
The limitations are
• A lack of data point traceability, especially for
the reviewers.
11/27/2013
Cytel Inc.
30
31. Business rules for plus datasets
• Plus datasets
• The same SAS program as the final ADaM dataset
development program. We do not have separate dataset
programs for the intermediate permanent datasets.
• Same number of the records – we keep the same number
of records between SDTM datasets and SDTM plus datasets
and also ADaM datasets and ADaM plus datasets.
• Naming convention : the prefix of ‘_’ and original SDTM or
final ADaM
• Plus variables
• The temporary variables by adding the prefix ‘_’.
• No Standard for plus variables – we assign the labels, but
do not follow any CDISC standards.
11/27/2013
Cytel Inc.
31
32. 3rd method : SDTM thru ADaM to
final ADaM
The benefits are
• Easy to follow each step
• Great data point traceability
The limitations are
• Need to create and validate all ADaM datasets
including the intermediate ADaM datasets
• Not much flexibility of ADaM datasets as the
intermediate datasets
11/27/2013
Cytel Inc.
32
34. Conclusion
• Three methods for a complex ADaM datasets
1. SDTM datasets to ADaM datasets
2. SDTM datasets through the intermediate
permanent datasets to final ADaM datasets
3. SDTM datasets through the intermediate ADaM
datasets to final ADaM datasets
• More options for a complex ADaM dataset
creation
• Analysis will dictate the type of methods
11/27/2013
Cytel Inc.
34