MakroCare is a global functional service provider specialized in Biostatistics, SAS Programing. The companies state-of-the-art facility in Hyderabad, India is comprised of highly qualified SAS programmers dedicated to biopharmaceutical development projects.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Multiple Comparisons and SAS Arrays in Clinical Trials
1. News letter Februar y 2011
Multiple Comparisons in Clinical Trial
Most of the doubts in the results of randomized clinical trials
(RCT) result either from inadequate sample size or from
problems of multiplicity. The Problems of multiplicity arises
from the testing of multiple hypotheses or the testing of a
hypothesis at multiple points in time. Several common
problems of this type includes: multiple analyses of
accumulating data at different time points like frequent
interim analysis, analyses of multiple endpoints, multiple
subgroup analysis of Subjects, multiple treatment group
contrasts and interpreting the results of multiple clinical
trials especially in meta analysis. The Clinical trials often
require number of outcomes to be calculated and a number
of hypotheses to be tested. Such testing involves comparing
treatments using multiple outcome measures with
univariate statistical methods. Studies with multiple
outcome measures occur frequently within medical
research. Some researchers recommend adjusting the
p-values when clinical trials use multiple outcome measures
so as to prevent the findings from falsely claiming "statistical 3. Scheffe, Tukey, and Tukey–Kramer procedures:
significance". But some researches have not agreed with this 1.These procedures give great flexibility for many
strategy, because it is not appropriate and may cause randomized biological experiments where variations
mislead the conclusions from the study. between experimental units are not of major concern and
the experimenter is not able to specify specific comparisons
Multiple tests make the traditional 0.05 level of test no or contrasts in advance.
longer necessarily valid and needs to be controlled.
However, in the case of a study that includes Scheffe’s procedure provides a way of looking into all
multiple-treatment groups and/or multiple endpoints, a possible linear contrasts of K treatment means with the
multiple hypotheses testing procedure is used to control the adjustment of Type I error rate inflation. Similarly, Tukey and
type 1 error. Tukey–Kramer procedures provide tests and confidence
intervals for all possible pairwise comparisons of the
Most commonly used multiple tests procedures are: treatment means. However, as the number K increases,
these methods become quite conservative in declaring
1. Bonferroni and Sidak procedures: significance.
The Bonferroni is probably the most commonly used test,
because it is highly flexible, very simple to compute, and can Are multiple comparisons really needed?
be used with any type of statistical test. We divide the level
of significance by total no-of comparisons. For example if in Here are three situations where multiple comparisons are
a clinical trial we compare two treatments within five not needed.
subsets of Subjects the treatments will be significantly
different at the 0.01 level if there is a P value less than 0.01(α 1.1.The account for multiple comparisons when we
* = 0.05 /5) within any of the subsets. In Sidak procedure, interpret the results rather than in the calculations
which is modified procedure of Bonferroni, each test is The Testing of multiple hypotheses at once creates a
carried out at level α* = 1- [(1-α) 1/K]. These methods are confusion that cannot be escaped. If we do not make any
recommended when the comparison tests are independent. corrections for multiple comparisons, it becomes 'very easy'
to find 'significant' results by chance -- it is too easy to make
2. Dunnett’s test: a Type I error. But if we do corrections for multiple
This is a classical and a frequently used test for many comparisons, we lose power to detect real differences -- it is
randomized laboratory experiments and even for clinical too easy to make a Type II error. The only way to escape this
trials where multiple-treatment means are compared to that dilemma is to focus on analyses, and thus avoid making
of the control; the multiple treatments are often multiple multiple comparisons. For example, if the treatments are
doses of the same treatment. ordered, then don't compare each mean with other means
01
2. Newslet ter Feb ruar y 2011
(multiple comparisons), instead just perform one test for compared to subjects taking placebo. The drug worked. The
trend to check if the outcomes are linearly related or not. investigators also analyzed each of the endpoints. Those
Another situation is that if there is a positive and negative taking the drug had fewer deaths, and fewer heart attacks,
control groups included apart from experimental groups, and fewer strokes, and fewer hospitalizations for chest pain
then don't include them as part of the ANOVA and as part of (compared to those who are taking placebo). The data from
the multiple comparisons. Some statisticians recommend different demographic groups were then analyzed
that no need for correcting type 1 error for multiple separately. Separate subgroup analyses were done for men
comparisons while analyzing data. Instead report all and women, old and young, smokers and nonsmokers,
individual P values and confidence intervals, and make it subjects with hypertension and without, subjects with a
clear that no mathematical correction was made for multiple family history of heart disease and those without. In each of
comparisons. When we interpret these results, we need to 25 subgroups, Subjects receiving the Statin drug
informally account for multiple comparisons. experienced fewer primary endpoints than those taking
placebo, and all these effects were statistically significant.
2. The corrections or adjustments may not be needed if we The investigators had made no correction for multiple
make only a few planned comparisons comparisons for all these separate analyses of outcomes and
Some statisticians recommend not doing any formal subgroups. No adjustments or corrections were needed,
corrections or adjustment for multiple comparisons when because the results are so consistent. The each multiple
the study focuses only on a few scientifically sensible comparisons ask the same basic question in a different way,
comparisons, rather than every possible comparison. The and all comparisons pointed to the same conclusion that
term planned comparison to describe this situation subjects taking the drug had less cardiovascular disease
(Planned comparison: It requires that we focus on a few than those taking placebo.
scientifically sensible comparisons, we can't decide which
comparisons to do after looking at the data. The choice must The treatment comparisons in randomized clinical trials
be based on the scientific questions we are asking, and be usually involve many endpoints such that conventional
chosen when we design the experiment). significance testing can seriously inflate the overall type 1
error rate. One option is to select a single endpoint for
3. The Correction or adjustment for multiple comparisons formal statistical inference, but this is not always feasible.
are not needed when the comparisons are complementary Another approach is to apply bonferroni correction (i.e.
The example for this situation is taken from the study multiply each p-value by the total no-of comparisons). The
reported by Ridker and colleagues. They asked whether excessive use of the multiple significance tests in clinical
lowering LDL cholesterol would prevent heart disease in trials can greatly increase the probability of false positive
Subjects who did not have high LDL concentrations and did findings. The problem is difficult by the fact that endpoints
not have a prior history of heart disease (but did have an are usually correlated or related and studies often have a
abnormal blood test suggesting the presence of some mixture of data types, e.g. quantitative, binary and survival
inflammatory disease). The study included almost 18,000 data. Perhaps the common method in the medical literature
subjects. Half of subjects received a statin drug to lower LDL is to analyze each endpoint separately, presenting multiple
cholesterol and half received placebo. The investigators’ p-values and an overall subjective conclusion. At best, this
primary goal was to compare the number of “end points” provides an open display of data enabling readers to draw
which occurred in the two groups, including deaths from a their own (possibly different) conclusions. WWWhenever
heart attack or stroke, nonfatal heart attacks or strokes, and multiple comparisons are taking place we need to adjust the
hospitalization for the chest pain. These events happened type 1 error rate accordingly, except a few situations
about half as often to many Subjects treated with the drug mentioned in this paper
SAS Arrays in Clinical Trials
Introduction:
Statistical Analysis System (SAS) is an integral part of clinical
trial data management and statistical analysis. The
Regulatory agencies like FDA insist to use SAS for clinical
trial data analysis. SAS programmers write programs in
various ways to produce the tables, listings and figures
(TLFs). However the efficient programmers write few lines of
code to produce the final TLFs. The objective of this paper is
to highlight the use of arrays in SAS programming which
may be most efficient, time saving and cost effective in the
pharmaceutical industry.
02
3. Newslet ter Feb ruar y 2011
SAS System An array statement must be used to define an array
SAS System helps to analyze and organize a collection of before the array name can be referenced
data items using SAS programming statements. A SAS If the elements are not specified on the ARRAY
program is a collection of SAS statements in a logical sequence. statement, SAS will use the Array name, append an
element number as a suffix starting at 1 and check to see
SAS is available in multiple computing environments like
if that variable name exists already in the Program Data
Windows, Unix etc.
Vector (PDV). If those variable names do not exist, it is the
array that actually creates them as variables in the PDV
SAS Array
It is a temporary grouping of SAS variables that are arranged _TEMPORARY_ signals to SAS that it does not need to
in a particular order create actual variables In the PDV for this array and that
the elements of the array will be held in memory but not
is identified by an array name output as variables to the data set.
exists only for the duration of the current DATA step By using the asterisk '*', SAS will count the number of
is not a variable array variables
The array name can be any name as long as it does not
Once the array has been defined the programmer is now match any of the variable names in data set or any SAS
able to perform the same tasks for a series of related keywords and it must adhere to the SAS naming
variables, the array elements. Arrays are widely used in the convention
Pharmaceutical Industry.
Array names cannot be used in label, format, drop, keep
The use of arrays allows simplify processing of SAS. Arrays helps or length statements
read and analyze repetitive data with a minimum coding.
USING ARRAY INDEXES:
The ARRAY statement defines the elements in an array. The array index is the range of array elements.
These elements will be processed as a group and refers to
elements of the array by the array name and subscript. For example, the temperature for each of the 24 hours of the
day is defined as:
Syntax: array temperature_array {24} temp1 – temp24;
Array array-name (index variable) <$> <length>
array-elements <(initial-values)>; There may be scenarios when the index has to begin at a
lower bound other than 1 (say 6) and upper bound other
The ARRAY statements provides the following information than 24 (say 18). This is possible by modifying the subscript
about SAS array: value when the array is defined.
array-name – Any valid SAS name array temperature_array {6:18} temp6 – temp18;
index variable– Number of elements within the array The subscript can be written as the lower bound and upper
bound of the range, separated by a colon.
$ - Indicates character type variables are elements
within the array
ONE DIMENSION ARRAYS:
array-elements – List of SAS variables to be part of the array
The array statement to define the one-dimensional array will
length – A common length for the array elements be, for example
initial values – Provides the initial values for each of the array temperature_array {24} temp1 – temp24;
array elements
The array has 24 elements for the variables TEMP1 through
These SAS variable lists enable to reference variables that TEMP24.
have been previously defined in the same DATA step
_NUMERIC_ indicates all numeric variables When the array elements are used within the data step the
array name and the element number will reference them.
_CHARACTER_ indicates all character variables
For example, the reference to the ninth element in the
_ALL_ indicates both numeric and character variables temperature array is: temperature_array{9}
RULES FOR ARRAY STATEMENTS: MULTI-DIMENSION ARRAYS:
Some important rules to keep in mind when using arrays in If there is more than one dimension then it is a Multi
SAS programs: Dimensional array.
An array statement must contain either all numeric or all For Example, the array statement to define the
character elements. i.e. mixed type variables are not allowed two-dimensional array will be:
03
4. Newslet ter Feb ruar y 2011
array ae_array {3, 12} aeterm1-aeterm12 Preferredterm1 – DO i=1 to 7;
Preferredterm12 visit1 - visit12 ; c{i}=( f{i}-32 )*5/9;
END;
The array contains three sets of twelve elements. When the
FORMAT c1-c7 4.1;
array is defined the number of elements indicates the
CARDS;
number of rows (first dimension), and the number of
aug 94 98 99 98 99 96 91 90 88 89
columns (second dimension).
sept 93 92 87 87 89 90 91 92 82 80
;
TEMPORARY ARRAYS:
A temporary array is an array that exists only for the duration PROC PRINT;
of the data step where it is defined. A temporary array is title1 'DATA; FTOC2';
useful for storing constant values, which are used in title2 'Explicit Array Example';
calculations. I n a temporar y ar ray there are no RUN;
corresponding variables to identify the array elements. The
elements are defined by the key word _TEMPORARY_. SORTING ARRAYS:
SORTQ can be used for character fields and SORTN can be
Example: array systolicbp {6} _temporary_ (120 103 114 132 used to sort numeric variables. An example of sorting
109 105); several numeric variables is as follows:
EXPLICIT VS IMPLICIT SUBSCRIPTING: data _null_;
Earlier versions of SAS originally defined arrays in a more array xarry{6} x1-x6;
implicit manner as follows: set datasetname;
call sortn(of x1-x6);
array array-name<(index-variable)> <$> <length> run;
array-elements <(initial-values)>;
Following are some of the functions widely used in arrays.
When an implicit array is defined, processing for every
element in the array may be completed with a DO-OVER HBOUND FUNCTION:
statement, an index variable may be indicated after the array This function returns the upper bound of the dimension of
name, For Example, an array.
Example 1: One-dimensional Array
*** Implicitly subscripted array; In this example, HBOUND returns the upper bound of the
DATA ftoc; dimension, a value of 5. Therefore, SAS repeats the
INPUT month $ f1-f7; statements in the DO loop five times.
ARRAY f(i) f1-f7; array big{5} weight sex height state city;
ARRAY c(i) c1-c7; do i=1 to hbound(big5);
DO over f; more SAS statements....
c=(f-32)*5/9; end;
END;
FORMAT c1-c7 4.1; Example 2: Multidimensional Array
CARDS; This example shows two ways of specifying the HBOUND
aug 94 98 99 98 99 96 91 90 88 89 function for multidimensional arrays. Both methods return
sept 93 92 87 87 89 90 91 92 82 80 the same value for HBOUND, as shown in the table that
; follows the SAS code example.
PROC PRINT; array mult{2:6,4:13,2} mult1-mult100;
TITLE1 'DATA: FTOC';
TITLE2 'Implicit Array Example'; Syntax Alternative Syntax Value
run;
HBOUND (MULT) HBOUND (MULT, 1) 6
TITLE;
HBOUND2 (MULT) HBOUND (MULT, 2) 13
RUN;
; HBOUND3 (MULT) HBOUND (MULT, 3) 2
This differs from the explicit array, previously discussed
where a constant value or an asterisk, as the subscript, LBOUND Function:
denotes the array bounds. For Example, This function returns the lower bound of the dimension of
an array.
*** Explicitly subscripted array;
DATA ftoc2; Example 1: One-dimensional Array
INPUT month $ f1-f7; In this example, LBOUND returns the lower bound of the
ARRAY f{7} f1-f7; dimension, a value of 2. SAS repeats the statements in the
ARRAY c{7} c1-c7; DO loop five times.
04
5. Newslet ter Feb ruar y 2011
array big{2:6} weight sex height state city; original data, each person has 3 observations. In the final
do i=lbound(big) to hbound(big); version, each person should have just one observation.
...more SAS statements...; In the "before" scenario, the data are already sorted BY
end; NAME DATE:
Example 2: Multidimensional Array
This example shows two ways of specifying the LBOUND NAME DATE1,
function for multidimensional arrays. Both methods return Amy Date #A1
the same value for LBOUND, as shown in the table that Amy Date #A2
follows the SAS code example. Amy Date #A3
array mult{2:6,4:13,2} mult1-mult100; Bob Date #B1
Bob Date #B2
Bob Date #B3
Syntax Alternative Syntax Value
In the "after" scenario, the data will still be sorted by NAME:
LBOUND (MULT) LBOUND (MULT, 1) 2
LBOUND2 (MULT) LBOUND (MULT, 2) 4 NAME DATE1 DATE2 DATE3
Amy Date #A1 Date #A2 Date #A3
LBOUND3 (MULT) LBOUND (MULT, 3) 2 Bob Date #B1 Date #B2 Date #B3
DIM FUNCTION: The PROC TRANSPOSE program is as follows:
This function returns the total number of elements in an
PROC TRANSPOSE DATA=OLD OUT=NEW
array.
PREFIX=DATE;
Example 1: One-dimensional Array VAR DATE;
In this example, DIM returns a value of 5. Therefore, SAS BY NAME;
repeats the statements in the DO loop five times.
array big{5} weight sex height state city; The PREFIX= option controls the names for the transposed
do i=1 to dim(big); variables (DATE1, DATE2, etc.) Without it, the names of the
more SAS statements; new variables would be COL1, COL2, etc. Actually, PROC
end; TRANSPOSE creates an extra variable, _NAME_. _NAME_ has
a value of DATE on both observations, indicating the name
Example 2: Multidimensional Array of the transposed variable
This example shows two ways of specifying the DIM
function for multidimensional arrays. Both methods return The equivalent DATA step code using arrays could be:
the same value for DIM, as shown in the table that follows
the SAS code example. DATA NEW (KEEP=NAME DATE1-DATE3);
ARRAY DATES {3} DATE1-DATE3;
array mult{5,10,2} mult1-mult100; DO I=1 TO 3;
SET OLD;
Syntax Alternative Syntax Value DATES{I} = DATE;
END;
DIM (MULT) DIM (MULT, 1) 5
However, the programmer could choose either proc
DIM2 (MULT) DIM (MULT, 2) 10 transpose or arrays in the data step.
DIM3 (MULT) DIM (MULT, 3) 2 Conclusion:
Arrays play a vital role and is much efficient in SAS
Arrays Vs Proc Transpose programming in clinical trial data management and
To transpose the data (turning variables into observations or statistical analysis. Since arrays reduce CPU time, cost
turning observations into variables), one can use either effective and reduces repetitive coding, it is a better choice
PROC TRANSPOSE or array processing within a DATA step. for SAS programmers in their daily programming activities
A Simple Transposition: References
For example, in a simple situation, where the program 1. The Little SAS Book. Lora D. Delwiche and Susan J Slaughter
should transpose observations into variables. In the 2. SAS Language reference from sas.com
About MakroCare
MakroCare is a global drug development services firm that operates through 4 main divisions - CRO, SMO, Informatics
and Consulting. Integrated and innovative services in the areas of regulatory affairs, risk management, site management,
patient recruitment, trial management (P II/III and late phase), biometrics, QA audits, PV/Safety, and informatics.
www.makrocare.com
05