SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
SAS
Summary Guide




           School of Applied Statistics
                        November, 03
1


                                                              Contents

1. Introduction........................................................................................................................2
        1.1 Structure of a SAS Job .........................................................................................2
        1.2 SAS Language......................................................................................................2
        1.3 SAS Variables ......................................................................................................2
        1.4 SAS Data Sets ......................................................................................................3
2. Introduction to the DATA Step .........................................................................................3
        2.1 DATA Statement..................................................................................................3
        2.2 Sources of Input ...................................................................................................3
        2.3 Input of Raw Data ................................................................................................4
        2.4 Formats: Input and Output ...................................................................................5
        2.5 How SAS Executes a DATA Step .......................................................................5
        2.6 Transformation of Data ........................................................................................5
        2.7 Missing Values.....................................................................................................5
        2.8 Modifying an Existing SAS Data Set ..................................................................6
        2.9 Output from a SAS DATA Step...........................................................................6
        2.10 Output to Create Stored ASCII Files .................................................................7
3. Introduction to the PROC Step ..........................................................................................7
4. Basic Procedures................................................................................................................8
5. More on the DATA Step....................................................................................................13
        5.1 IF - THEN - ELSE Statements.............................................................................13
        5.2 Selecting Observations.........................................................................................14
        5.3 DO and END Statements .....................................................................................14
        5.4 DO Loops .............................................................................................................14
        5.5 Arrays...................................................................................................................15
        5.6 RETAIN ...............................................................................................................15
        5.7 DROP and KEEP .................................................................................................15
        5.8 RENAME and LABEL ........................................................................................16
6. Data Management ..............................................................................................................16
        6.1 SET.......................................................................................................................16
        6.2 MERGE................................................................................................................17
        6.3 UPDATE ..............................................................................................................17
7. Statistical Procedures.........................................................................................................18
8. Graphical Procedures .........................................................................................................21
9. Output Delivery System (ODS) .........................................................................................22
10. Further Facilities ..............................................................................................................23
11. Publications......................................................................................................................23




SAS Summary Guide                                              November, 03                      School of Applied Statistics
2


1. Introduction
This handout is meant as a brief introduction to the syntax of the SAS package which is
available on UNIX workstations and PC computers at The University of Reading. The SAS
language is similar for all versions but there are differences in file access and storage. This
document is designed to give a brief synopsis of many basic commands used in the Data step
and the general structure to some statistical procedures (Proc). It is, by no means, complete
and there are numerous specialised manuals published by SAS Institute (some of which are in
Room G16 in the School of Applied Statistics).

1.1 Structure of a SAS Job
A SAS program consists of a sequence of one or more steps and each step may contain
several SAS statements. There are two kinds of step:-
•      The DATA step which is used to create and manipulate SAS data sets
•      The PROC step which is used for analysing or processing SAS data sets
A SAS job is made up of any number of these steps. The beginning of one step signifies the
ending of the previous step.

1.2 SAS Language
SAS statements can begin in any column of a line and can be continued on subsequent lines.
Each SAS statement must end with a semicolon but is mainly case-sensitive (i.e. upper and
lower case should not be freely mixed).
There are three types of SAS statements:-
•      Statements which appear in the DATA step
•      Statements which appear in the PROC step
•      Statements which can appear anywhere (global statements)
Comments can also be included in a SAS program, these are useful for annotating your
program. An asterisk is used to comment out a single statement.
e.g.      * This is a comment ;

or to comment out a block of lines use the /* and */ delimiter pairs:-
e.g.      /* This is a comment
              which will not be acted upon by SAS */

1.3 SAS Variables
There are two types of SAS variable - numeric and character. They can have the following
attributes:-
LENGTH           numeric variables 2 - 8 bytes
                 character variables 1 - 200 bytes / characters
INFORMAT         format SAS uses to read a data value into a variable
FORMAT           format SAS uses to write each value of a variable
LABEL            descriptive label of up to 256 characters



SAS Summary Guide                            November, 03            School of Applied Statistics
3


1.4 SAS Data Sets
A SAS data set is a collection of data values arranged in a rectangular table, the rows
representing observations and the columns representing variables. Each variable must be
given a name which consists of 1 - 32 characters. The name must start with a letter and can
contain any alphanumeric character or underscore. Avoid special characters in variable
names such as . or $ . Special variables within SAS are denoted by names that begin and end
with an underscore.
SAS data sets can be either temporary or permanent. Temporary data sets are given a one-
level name by the user which is automatically prefixed with WORK. by the SAS system.
This name can be omitted altogether, in which case SAS names the data sets DATA1,
DATA2 ... for the 1st, 2nd ... data sets defined. Temporary data sets are erased on leaving the
current SAS session. Permanent data sets must be given a two-level name by the user linking
to their storage location.
e.g.   LIBNAME PERM 'complete_pathname';
       PROC PRINT DATA=PERM.STUDENTS;
       RUN;
Permanent SAS data sets are stored differently between versions and allocated different file
extensions. However, all data sets are upward compatible. There are several words which
should not be used as the first part of the SAS data set name. These include such words as
PRINT, EXEC, DATA etc. and also SAS reserved names such as LIBRARY, MAPS, WORK
etc.
SAS automatically documents a permanent data set to include a data set label, variable
attributes and history information. The data are stored in the form in which SAS uses them,
therefore saving computer time and making it unnecessary to execute input statements each
time the data set is used.

2. Introduction to the DATA Step
2.1 DATA Statement
The DATA statement signals the beginning of the DATA step and gives a name to the SAS
data set being created. This SAS data set can be used as input to any subsequent DATA or
PROC steps.
e.g.   a) DATA PERM.PATIENTS;                creates a permanent data set
       b) DATA SCHOOL;                       creates a temporary data set
       c) DATA;                              creates a temporary data set with
                                             default name DATAn
       d) DATA _NULL_;                       does not create a data set

2.2 Sources of Input
a) The DATALINES or CARDS statement is used when the data are in the same file as the
   SAS statements:-
       DATA REGRESS;
       INPUT X Y Z;



SAS Summary Guide                          November, 03            School of Applied Statistics
4

          DATALINES;
          61 44 29
          17 6 43
          .
          .
b) The INFILE statement is used to read data from an external file on your workdisk:-
          DATA REGRESS;
          INFILE 'file_identifier';
          INPUT X Y Z;
The file identifier in the INFILE statement is the full pathname and filename of the external
data file, residing on your disk, which is to be linked to your SAS program.

2.3 Input of Raw Data
The INPUT statement is used to describe the raw input data. There are three types of input
mode which can be mixed in one INPUT statement:-
•      LIST (or free-field)
•      COLUMN
•      FORMATTED


a)     LIST INPUT
This mode of input simply lists the variables in the order in which they appear in the input
data
e.g.      INPUT NAME $ AGE SEX $;
          INPUT NAME $ Q1-Q32;
where $ is used after a variable name to indicate a character variable whose value has a
default length of 8 with no embedded blanks. Values must be separated by at least one space
(free format).
b) COLUMN INPUT
With this mode of input the columns are specified within which each variable value is located
e.g.      INPUT CANNAME $ 1-15 PARTY $ 20-24 VOTES 30-40;

The data values can be read in any order and blank fields are automatically set to missing.
Embedded blanks are allowed in character data by specifying the maximum length of a value.
c)     FORMATTED INPUT
This is a very flexible method of input as it is possible to read data in virtually any form. SAS
keeps track of its position on the input lines with a 'pointer'
e.g.      INPUT @3 QUEST3 +10 QUEST12 / @60 RESPONSE;
There are various types of 'pointer' controls each having a different meaning. Listed below
are some of the more frequently used ones:-
@n        move pointer to column n



SAS Summary Guide                           November, 03            School of Applied Statistics
5


+n        move the pointer forward n columns
#n        move pointer to line n
/         move to next line
Whichever mode of input is used the following 'pointer' controls can be used to maintain the
current pointer position:-
@         'hold' data line for next INPUT statement in the current DATA step
@@        'hold' data line for more executions of the DATA step

2.4 Formats: Input and Output
A set of directions for reading a value is called an INFORMAT and a set of directions for
printing a value is called a FORMAT. It is possible to specify formats for numeric and
character variables and also date and time variables. There are a large number of FORMAT
and INFORMAT specifications, refer to SAS Language Reference Version 8 for further
information.

2.5 How SAS Executes a DATA Step
A DATA step is executed once for each observation in the data set. A DATA step that does
not contain an INPUT, SET, MERGE or UPDATE statement is executed once. The SAS
variable _N_ is automatically generated for each DATA step, its value is the number of times
that SAS has begun executing the step (_N_ is not directly available outside the current
DATA step). All variables referred to in the DATA step, for example the variables named in
the input statement and any new variables generated, make up the program data vector.
For each execution of the DATA step:-
•      The program data vector is initialised to missing.
•      The data values of the current observation are read using the INPUT statement. Any
       new variables are computed and added to the program data vector and any variables not
       wanted are dropped.
•      The values in the program data vector are then added to the data set being created

2.6 Transformation of Data
There is a range of standard functions available in SAS for transforming data. For a full list
of these functions consult the SAS Language Reference. Manipulation and transformation of
data is carried out in the DATA step with the resulting variable being added to the data set
automatically.
e.g.      SUM=X + X;
          X2=X * X;                or   X2=X**2;
          LX=LOG(X);

2.7 Missing Values
Variables with missing values on input are specified in SAS by a full stop or a blank field.
On output numeric variables are displayed as a full stop and character variables as a blank
field. For numeric variables it is also possible to specify up to 27 special missing value
symbols ( A - Z and _ ) to distinguish between different kinds of missing data. This is done
using the MISSING statement:-


SAS Summary Guide                             November, 03           School of Applied Statistics
6

        DATA;
        INPUT X;
        MISSING A B;
        IF X = 99 THEN X = .A;
        IF X = 999 THEN X = .B;
        CARDS;


a) .A is used to distinguish from the variable name A
b) A variable is set to missing if the input field contains only a full stop or is blank.
c) A variable is set to missing if the input field contains an illegal character

2.8 Modifying an Existing SAS Data Set
Once data have been read into a SAS data set it is possible to modify that data in other DATA
steps while keeping the original data set unchanged and without having to re-input the data
from the raw data file. This is easily done by transferring data from the existing SAS data set
into another one.
e.g.    DATA NEW;
        SET PERM.PATIENTS;
        DOSE=PILL_A*QTY_A;
Each time the SET statement is executed another observation is transferred from the existing
SAS data set PERM.PATIENTS to the SAS data set being created and called NEW .

2.9 Output from a SAS DATA Step
OUTPUT statements allow you to control when an observation is written to one of the SAS
data sets which are currently being created.
e.g.    OUTPUT;
        OUTPUT MISSDATA;
When an OUTPUT statement is executed SAS will immediately output the current values to
the named or current SAS data set. OUTPUT statements are useful for:-
a) Creating 2 or more observations from 1 record of input data
b) Combining several observations into one observation
c) Creating more than one SAS data set from one input file
eg.     DATA HARV1 HARV2;
        SET COMPLETE;
        IF HARVEST=1 THEN OUTPUT HARV1;
        IF HARVEST=2 THEN OUTPUT HARV2;




SAS Summary Guide                             November, 03             School of Applied Statistics
7


2.10 Output to Create Stored ASCII Files
The FILE and PUT statements are used within a DATA step and are analogous to the INFILE
and INPUT statements. The FILE command links SAS to a specific external file, while the
PUT command specifies the output record format.
e.g.      DATA CREATE;
          SET CLASSNO;
          FILE 'file_identifier';
          PUT NAME $ 1-8 SEX $ 11 AGE 13-14;


3. Introduction to the PROC Step
Some of the procedures available in SAS are:-
Basics:            CHART, CONTENTS, CORR, DATASETS, FORMAT, FREQ, MEANS,
                   PLOT, PRINT, SORT, SUMMARY, TABULATE, TRANSPOSE,
                   UNIVARIATE
Statistics:        ANOVA, CANCORR, CANDISC, CLUSTER, DISCRIM, FACTOR, GLM,
                   PRINCOMP, REG, TTEST
Graph:             GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, G3D, G3GRID


SAS procedures analyse and process SAS data sets as follows:-
a) Read SAS data sets
b) Perform the requested task
c) Print results
d) Create SAS output data sets (optional)


Most SAS procedures have default option settings for the more common situations or
analyses. However, information can be given to the PROC step to specify:-
a) Which data set to process
b) Which variables to process
c) Whether to process the data in subsets
The PROC statement is used to begin a procedure.
e.g.      PROC MEANS DATA=PERM.PATIENTS MEAN STD;


Some of the more commonly used statements within the PROC step are:-
a) General statements common to many procedures
VAR                Specifies variables to be analysed
ID                 Specifies a variable whose values identify observations in the SAS data set



SAS Summary Guide                              November, 03           School of Applied Statistics
8


BY             Specifies that the data set is to be processed in groups
               N.B. The data set must have already been sorted in the order of the current
               BY group.
WEIGHT         Specifies a variable whose values are the relative weights for the observations
WHERE          Subsets observations to be analysed based on specified criteria


b) Statements specific to individual procedures
TABLES         Table request in PROC FREQ
PLOT           Plot request in PROC PLOT
MODEL          Model specification in PROC ANOVA, PROC GLM, PROC REG etc.


c) Statements describing variable attributes
FORMAT         Specifies formats for printing variable values
LABEL          Associates descriptive labels with variable names


Lists of names can be abbreviated:-
a) Range of variables                 VAR SEX -- TEMP;

b) Numeric suffix range               VAR Q1 - Q20;

c) Range of numeric variables only VAR AGE _NUMERIC_ TEMP;
d) Range of character variables only VAR NAME _CHARACTER_ SEX;
e) All numeric variables              VAR _NUMERIC_;

f) All character variables            VAR _CHARACTER_;


4. Basic Procedures
PROC CHART
This procedure produces horizontal and vertical bar charts, pie charts, star charts and block
charts for numeric and character variables. The charts can represent frequencies and
cumulative frequencies, percentages and cumulative percentages, sums and means.


PROC CHART DATA = data_set_name options ;
HBAR variable_list ;                                  produces horizontal bar chart
VBAR variable_list ;                                  produces vertical bar chart
PIE variable_list ;                                   produces pie chart
STAR variable_list ;                                  produces star chart
BLOCK variable_list ;                                 produces block chart
BY variable_list ;


SAS Summary Guide                           November, 03            School of Applied Statistics
9



PROC CORR
This procedure computes correlation coefficients between variables.      Various univariate
statistics are also computed.


PROC CORR DATA = data_set_name options ;
VAR variable_list ;
WITH variable_list ;
WEIGHT variable ;
FREQ variable ;
BY variable_list ;


PROC FORMAT
This procedure is used to define formats for specifying labels for variable values used for
output. Formats can be used for either numeric or character variables. They can be used in
PUT statements in a DATA step and in FORMAT statements in a PROC step. In FORMAT
statements in a DATA step they can also be used in which case they are then associated with
the variable for the remainder of the SAS job, unless changed.


PROC FORMAT options ;
VALUE format_name            value1 = label1
                             value2 = label2
                                  .     .
                             valuen = labeln ;


format_name Must be a unique SAS name which must begin with a $ for character variables
values         Can be a single number or a range of numbers, or several numerical or
               character values
labels         Labels can contain a maximum of 40 characters and must be enclosed in
               quotes


e.g.     PROC FORMAT;
         VALUE $SEXFMT 'M' = 'Male' 'F' = 'Female';
         VALUE AGEFMT 1 - 16 = 'Child' 17 - High = 'Adult';


The formats defined above can be used in other procedures as follows:-
         PROC PRINT DATA = PERM.PATIENTS;



SAS Summary Guide                           November, 03         School of Applied Statistics
10

       VAR SEX AGE;
       FORMAT SEX $SEXFMT. AGE AGEFMT. ;

NB. The full stop after SEXFMT and AGEFMT is essential


PROC FREQ
This procedure produces 1 - way to n - way frequency tables of character and numeric
variables.


PROC FREQ DATA = data_set_name options ;
WEIGHT weighting_variable ;
BY variable_list ;
TABLES table_request / options ;


In the TABLES specification the values of the last variable form the columns and the values
of the second last variable form the rows.
e.g.   TABLES VAR1;                 one - way table
       TABLES VAR1 * VAR2;          two - way table


PROC MEANS
This procedure is used to produce simple univariate statistics for numeric variables. The
options available allow you to specify which statistics you want calculated e.g. mean,
standard deviation, minimum. If no statistics are specifically requested in the MEANS
statement, then variable name, N, mean, standard deviation, minimum, maximum are
printed automatically.


PROC MEANS DATA = data_set_name options ;
BY variable_list ;
VAR variable_list ;
ID variable_list ;
FREQ variable ;
WEIGHT weighting_variable ;
OUTPUT OUT = output_data_set_name statistics ;




SAS Summary Guide                        November, 03           School of Applied Statistics
11


PROC PLOT
This procedure produces line-printer plots for both numeric and character variables. Various
options are available for specifying the plotting symbol, scaling the axes, drawing reference
lines, superimposing 2 or more plots and drawing contour plots.


PROC PLOT DATA = data_set_name options ;
PLOT vertical_variable * horizontal_variable / options ;
BY variable_list ;


PROC PRINT
This procedure prints the values in a SAS data set.


PROC PRINT DATA = data_set_name options ;
BY variable_list ;
VAR variable_list ;
ID variable_list ;
PAGEBY variable ;
SUM variable_list ;
SUMBY variable ;


PROC SORT
This procedure rearranges the observations in an existing SAS data set or creates a new data
set containing the rearranged observations. Multiple sorting groups can be specified and
variables can be sorted in ascending or descending order.


PROC SORT DATA = data_set_name OUT = output_data_set_name options ;
BY variable_list ;


Variables are automatically sorted in ascending order, for descending order put
DESCENDING before the variable names in the BY statement. The SORT procedure should
always be used when subsequent procedures process the data set in groups using the BY
statement. It is possible to process a data set without sorting it beforehand by using the
NOTSORTED option on the BY statement of the procedure being used. However, SAS
assumes that consecutive observations with the same BY value are grouped together although
the BY values are not necessarily sorted in alphabetic or numeric order.




SAS Summary Guide                          November, 03          School of Applied Statistics
12


PROC SUMMARY
This procedure produces a SAS data set containing statistics similar to the MEANS
procedure, but much more efficiently. PROC SUMMARY does not produce any printed
output and the data does not have to be sorted in order to produce subgroup statistics. An
OUTPUT and a VAR statement must be specified, and any number of OUTPUT statements
can be used. The VAR statement must precede the OUTPUT statements.


PROC SUMMARY DATA = data_set_name options ;
CLASS variable_list ;
VAR variable_list ;
BY variable_list ;
FREQ variable ;
WEIGHT weighting_variable ;
ID variable_list ;
OUTPUT OUT = output data_set_name statistics ;


PROC TABULATE
This procedure provides a more flexible alternative to the FREQ procedure for producing
tables. Each cell in the table contains a descriptive statistic e.g. mean, standard deviation,
etc. TABULATE will generate tables defined by the TABLE statement. Classification
variables must be specified with the CLASS statement, while the variables to be tabulated i.e.
whose values are to be the cell contents must be specified by the VAR statement. Each
expression in the TABLE statement defines the categories for the table's dimensions - page,
row and column.


PROC TABULATE DATA = data_set_name options ;
CLASS variable_list ;
VAR variable_list ;
BY variable_list ;
FREQ variable ;
WEIGHT weighting_variable ;
FORMAT variables'_format ;
LABEL variable = 'label' ;
TABLE page_expression, row_expression, column_expression ;




SAS Summary Guide                          November, 03           School of Applied Statistics
13


PROC TRANSPOSE
This procedure transposes data sets, changing observations into variables and variables into
observations. An output data set is created automatically and named according to the
DATAn convention if a name is not specified.


PROC TRANSPOSE DATA = data_set_name options ;
VAR variable_list ;
ID variable ;
IDLABEL variable ;
COPY variable_list ;
BY variable_list ;

5. More on the DATA Step
5.1 IF - THEN - ELSE Statements
These statements are used to execute a further SAS statement conditional on some
expression.


IF expression THEN statement;
ELSE statement ;


THEN statement is executed if expression is non zero, non missing or true
ELSE statement is executed if expression is zero, missing or false


There are eight relational operators:-
LT or <         LE or <=       GT or >            GE or >=
NL or ~<        NG or ~>       EQ or =            NE or ~=


In addition there are three logical operators:-
NOT or ~        AND or &       OR


e.g.   DATA ;
       IF CODE = 1 OR CODE = 2 THEN SEX = 'MALE' ;
                                         ELSE SEX = 'FEMALE';


e.g.   DATA ;
       INPUT AGE ;



SAS Summary Guide                            November, 03            School of Applied Statistics
14

       IF 0 < AGE < 10 THEN AGEGRP = 1 ;
       IF 10 <= AGE < 19 THEN AGEGRP = 2 ;
       IF AGE >= 19 THEN AGEGRP = 3 ;


Any observations with values not included in one of the categories will produce missing or
blank values.

5.2 Selecting Observations
If not all observations are to be included in the data set being created they can be excluded by
the DELETE statement or the subsetting IF statement. The DELETE statement stops the
processing of an observation:-


e.g.   DATA MALES ;
       INPUT AGE SEX $ ;
       IF SEX = 'F' THEN DELETE ;


The subsetting IF statement allows an observation to pass if the expression is true:-


e.g.   DATA MALES ;
       INPUT AGE SEX $ ;
       IF SEX = 'M' ;


The result from both of the above DATA steps is the same.

5.3 DO and END Statements
DO statements specify that any statements following the DO are to be executed until a
matching END appears.


e.g.   DATA ;
       INPUT AGE SEX $ FAMILY $ ;
       IF SEX = 'F' THEN DO ;
                               AGE = AGE - 5 ;
                               FAMILY = 'NEW' ;
                               END ;
       ELSE AGE = AGE + 3 ;

5.4 DO Loops
DO loops allow a range of statements, within a DATA step, to be repeated either a specified
number of times or while a specified condition holds.


DO variable= start TO stop ;


SAS Summary Guide                          November, 03            School of Applied Statistics
15


DO variable = start TO stop BY increment ;
DO WHILE (expression) ;
DO UNTIL (expression) ;
DO OVER array_name ;


Each must have a matching END statement to terminate execution.


e.g.   DO N = 1 TO 20 ;
       DO N = 1 TO 20 BY 4 ;
       DO WHILE (N < 20) ;
       DO UNTIL (N = 20) ;

5.5 Arrays
Arrays in SAS are useful for processing a lot of SAS variables in the same way


ARRAY array_name [index_variable] array_elements ;


e.g.   ARRAY A Q1 - Q5 ;
       DO OVER A ;
       A = LOG(A) ;
       END ;


Array elements are substituted for the array name in SAS statements depending on the value
of the index variable. SAS will use its own internal index variable if none is defined. In the
example above the DO group is executed for every element in the array.

5.6 RETAIN
This statement retains a variable value from the last execution of the DATA step. Normally
all variables are set to missing before each execution of the DATA step. Initial values can
also be assigned to the variables.


RETAIN variable ;
RETAIN variable initial_value ;

5.7 DROP and KEEP
The DROP statement excludes named variables from a data set or analysis and the KEEP
statement includes only named variables in a data set or analysis. Both statements can be
used in the DATA step or as data set options which appear after the data set name on PROC
steps.




SAS Summary Guide                          November, 03           School of Applied Statistics
16


e.g.   DATA PERM.PATIENTS ;
       DROP PATNO ;


       DATA PERM.PATIENTS(DROP = PATNO) ;


       PROC PRINT DATA = PERM.PATIENTS(KEEP = AGE SEX) ;

5.8 RENAME and LABEL
The RENAME statement is used to rename variables.


RENAME old_name = new_name ;


The LABEL statement assigns labels of up to 40 characters to variables.


LABEL variable = 'label' ;

6. Data Management
6.1 SET
Reads observations from 1 or more SAS data sets and can interleave observations.


a) Subset the observations          DATA FEMALES ;
                                    SET STUDENTS ;
                                    IF SEX = 'F' ;


b) Subset the variables             DATA SMALL ;
                                    SET STUDENTS ;
                                    DROP WEIGHT AGE ;


c) Add a new variable               DATA ADD ;
                                    SET STUDENTS ;
                                    WTKG = WEIGHT / 2.2 ;


d) Multiple output data sets        DATA MALES FEMALES ;
                                    SET STUDENTS ;
                                    IF SEX = 'M' THEN OUTPUT MALES ;
                                    IF SEX = 'F' THEN OUTPUT FEMALES ;


e) Multiple input data sets         DATA ALL ;




SAS Summary Guide                         November, 03           School of Applied Statistics
17


(Concatenate)                         SET MALES FEMALES ;


f) Multiple input data sets           DATA ALL ;

(Interleave)                          SET MALES FEMALES ;
                                      BY NAME ;

6.2 MERGE
Combines observations from two or more SAS data sets and places them side by side.
a) One-to-one Merging
If there are the same number of observations in each data set and if the observations are in the
same order then they can be combined as shown below. The two data sets are placed side by
side in the combined data set being created.
       DATA COUPLES ;
       MERGE HUSBANDS WIVES;


For any duplicate variable name in the data sets, only the values of that variable from the last
named data set will be saved.


b) Match Merging
The two data sets, having already been sorted, are placed side-by-side in the order specified
in the BY statement.


       DATA STABLE ;
       MERGE HORSE TRAINER ;
       BY OWNER ;

6.3 UPDATE
Updates a master file with a transaction file where the BY variable is the KEY for matching
observations.


       DATA SURGERY;
       UPDATE SURGERY BLOODCT;
       BY PATIENT;


This should be used only when, for a master data set, there are several changes that can be
applied all in one job.




SAS Summary Guide                          November, 03            School of Applied Statistics
18


7. Statistical Procedures
There are a wide range of statistical procedures available in SAS for carrying out such
techniques as analysis of variance and covariance, linear and non-linear regression analysis,
multivariate methods and non-parametric methods. A few examples of some of the more
widely used procedures are given below. For more details on all the procedures available for
statistical analysis, consult the appropriate manuals.


PROC ANOVA
This procedure is used to carry out an analysis of variance of balanced data (see also PROC
GLM). Many of the statements which can be used with this procedure are not necessary for
standard analyses.
PROC ANOVA DATA=data_set_name options ;               
                                                            required statements;
CLASS variable_list ;                                 
                                                            must appear in this order
MODEL dependent_variables = effects / options ;       
BY variable_list ;                                    
                                                            must appear before the
ABSORB variable_list ;                                
                                                            first RUN statement
FREQ variable ;                                       
MEANS effects / options ;                                   can appear after the
TEST H = effects E = effect ;                               MODEL statement
                                                      
                                                      
MANOVA H = effects E = effect M = equations / options;      and can be used
REPEATED factor_names / options ;                     
                                                            interactively
e.g.   PROC ANOVA DATA = EXPT ;
       CLASS METHOD VARIETY ;
       MODEL YIELD = METHOD VARIETY METHOD * VARIETY ;
       BY YEAR ;




SAS Summary Guide                         November, 03           School of Applied Statistics
19


PROC GLM
This procedure can be used to fit general linear models to data to enable statistical methods
such as analysis of variance, analysis of covariance, regression analysis (including
comparison of regressions) and multivariate analysis of variance to be carried out.
Unbalanced data and data with missing values can also be analysed using this procedure.
There are numerous statements and options available with this procedure, but most
applications only use a few of them.
PROC GLM DATA=data_set_name options ;                       must precede MODEL
                                                           
CLASS variable_list ;                                       statement
MODEL dependent_variables = independent_variables / options ; required statement
ABSORB variable_list ;                                            
BY variable_list ;                                                
                                                                   must appear before the
                                                                  
FREQ variable ;                                                   
ID variable_list ;                                                 first RUN statement
                                                                  
WEIGHT weighting_variable ;                                       
                                                                  
CONTRAST 'label' effect_values / options ;                        
ESTIMATE 'name' effect_values / options ;                         
                                                                  
LSMEANS effects / options ;                                       
                                                                     can appear after the
MANOVA H = effects E = effect M = equations / options ;           
                                                                     MODEL statement
MEANS effects / options ;                                         
                                                                     and can be used
OUTPUT OUT = output_data_set_name;
                                                                     interactively
RANDOM effects / options ;                                        
REPEATED factor_names / options ;                                 
                                                                  
TEST H = effects E = effect / options ;                           
                                                                  

e.g.   PROC GLM DATA = EXPT2 ;
       CLASS TREAT SUBJECT TIME ;
       MODEL RESP = TREAT SUBJECT(TREAT) TIME TREAT * TIME ;
       TEST H = TREAT E = SUBJECT(TREAT) ;
       LSMEANS TREAT TIME TREAT*TIME ;
       OUTPUT OUT = NEW P = RHAT R = RESID ;




SAS Summary Guide                         November, 03           School of Applied Statistics
20


PROC TTEST
This procedure carries out a simple t-test on the means of two groups of observations. The
grouping factor specified by the CLASS statement it must have only two levels.
PROC TTEST DATA = data_set_name options ;
                                          required statements
CLASS variable_list ;                    
BY variable_list ;                       
                                          optional statements
VAR variable_list ;                      
e.g.   PROC TTEST DATA = EXPT5 ;
       CLASS SEX ;
       VAR SCORE ;



PROC NLIN
This procedure is used to fit nonlinear regression models. The model to be fitted has to be
specified, as do the parameters to be estimated, initial guesses for them, and possibly the
partial derivatives of the model with respect to each parameter. Some models are difficult to
fit and in these cases the initial guesses can be critical. There is no guarantee that the
procedure will be able to fit the model successfully.
PROC NLIN DATA = data_set_name options          ;
                                                 
PARMS parameter = values ;                        required statements
MODEL dependent variable = expression ;          
                                                 
BOUNDS expressions ;                             
BY variable_list ;                               
                                                 
                                                 
ID variable_list ;                                optional statements
DER.parameter = expression ;                     
                                                 
OUTPUT OUT = output_data_set_name ;              
                                                 
e.g.   PROC NLIN DATA = EXPT3 ;
       PARMS B0 = 0.5 B1 = 0.08 ;
       MODEL Y = B0*(1-EXP(-B1*X)) ;
       DER.BO = 1-EXP(-B1*X) ;
       DER.B1 = B0*X*EXP(-B1*X) ;




SAS Summary Guide                         November, 03           School of Applied Statistics
21


PROC REG
This procedure is used to fit linear regression models. There are other regression procedures
such as RSQUARE, RSREG and STEPWISE for selecting subsets of independent variables
in a multiple regression analysis, fitting quadratic response surfaces and carrying out
stepwise regression, respectively.
PROC REG DATA = data_set_name options ;                          }   required statement
                                                                     required statement for
MODEL dependent_variables = independent_variables / options ;}        model fitting:
                                                                     can be used interactively
VAR variable_list ;                                              
BY variable_list ;                                               
                                                                 
                                                                    must appear before the
FREQ variable ;                                                  
                                                                    first RUN statement
WEIGHT weighting_variable;
                                                                 
ID variable ;                                                    
                                                                 
ADD variable_list;                                               
DELETE variable_list;                                            
                                                                 
MTEST equations ;                                                
                                                                 
OUTPUT OUT = output_data_set_name ;                                 can appear anywhere after
                                                                 
PLOT y_variate*x_variate;                                           a MODEL statement and
REFIT;                                                              can be used interactively
                                                                 
RESTRICT equations ;                                             
REWEIGHT condition;                                              
                                                                 
TEST equations ;                                                 
                                                                 

e.g.   PROC REG DATA = EXPT4 ;
       MODEL POP = YEAR ;
       OUTPUT OUT = REGOUT P = EPOP R = RESID ;




8. Graphical Procedures
The majority of procedures available to produce high-quality, hard-copy graphical output
work in the same way as those mentioned in section 4. Syntactically most are prefixed by the
letter G e.g. GCHART, GPLOT etc. Additional global statements allow the user to specify
more precisely the axes, symbols and patterns etc. used in the representation of the data.
This is a topic beyond the scope of this Summary Guide but information can be found in the
two volumes of the manuals SAS/GRAPH. To produce hard-copy, the various versions of
SAS access the graphics devices in different ways, so refer to the appropriate SAS
Companion Guide for more complete information.




SAS Summary Guide                         November, 03           School of Applied Statistics
22


9. Output Delivery System (ODS)
Many procedures produced output data sets which could be used in further calculations e.g
parameter estimates from regression analysis. However, some more common procedures
lacked this facility. Since verion 7 the Output Delivery System (ODS) has made the saving
of datasets, formatted output for high-resolution printers and web quality output using HTML
much simpler.
Equally it is possible to control the output stream more effectively and greater choice of
output objects to data sets is available.
ODS is a vast topic with many individual statements. Each statement (shown in the next
table has its own set of options which are not shown here and are best described in the
manual.
Table of ODS Statements
ODS EXCLUDE {Specify output objects to exclude from ODS destinations.
                   Open, manage, or close the HTML destination. If
ODS HTML           
                    the destination is open, you can create HTML output.
ODS LISTING        {Open, manage or close the Listing destination.
                   Create a SAS data set from an output object and manage
ODS OUTPUT         
                    the selection and exclusion lists for the Output destination.
                   Specify which locations to search for the definitions that
                   
ODS PATH            were created by PROC TEMPLATE, as well as
                    the order in which to search for them.
                   
                   Open, manage or close the Printer destination. If the
ODS PRINTER        
                   destination is open, you can create Printer output.
ODS SELECT         {Specify output objects for ODS destinations.
                    Write to the SAS log the specified selection or
ODS SHOW           
                   exclusion list.
                    Write to the SAS log a record of each output object that is
ODS TRACE          
                   created, or suppress the writing of this record.
                   Print or suppress a warning that a style definition or a table
ODS VERIFY         
                   definition that is used is not supplied by SAS Institute.




SAS Summary Guide                          November, 03            School of Applied Statistics
23


10. Further Facilities
There are many more facilities in SAS in addition to those that have been documented here.
These include:-
•    A macro processing language
•    A full-screen editor (FSP) enabling data to be entered and updated. It also contains a
     spreadsheet facility.
•    Interactive matrix language (IML). A very powerful module for programming matrix
     algebra useful for statistical and mathematical applications
•    Time series module (ETS) for carrying out econometric and time-series analysis.

11. Publications
There is a vast range of SAS manuals for both UNIX and PC versions. They can be ordered
from:-
SAS Software Ltd.
Wittington House
Henley Road
Medmenham
Marlow
SL7 2EB

The Main Library on campus has a few manuals for reference based on previous versions. In
addition, users of SAS at The University of Reading can read the current documentation on-
line by registering at
http://v8doc.sas.com/sashtml/




SAS Summary Guide                          November, 03           School of Applied Statistics

Mais conteúdo relacionado

Mais procurados

Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processingguest2160992
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questionsDr P Deepak
 
Introduction to SAS
Introduction to SASIntroduction to SAS
Introduction to SASImam Jaffer
 
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objectsConcepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objectsFrans Jongma
 
153680 sqlinterview
153680  sqlinterview153680  sqlinterview
153680 sqlinterviewzdsgsgdf
 
Sql server lesson7
Sql server lesson7Sql server lesson7
Sql server lesson7Ala Qunaibi
 
Understanding sas data step processing.
Understanding sas data step processing.Understanding sas data step processing.
Understanding sas data step processing.Ravi Mandal, MBA
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL ISankhya_Analytics
 
Concepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to MetadataConcepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to MetadataFrans Jongma
 
Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.Frans Jongma
 

Mais procurados (16)

Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processing
 
Sas
SasSas
Sas
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questions
 
Sas training in hyderabad
Sas training in hyderabadSas training in hyderabad
Sas training in hyderabad
 
Introduction to SAS
Introduction to SASIntroduction to SAS
Introduction to SAS
 
SAS - overview of SAS
SAS - overview of SASSAS - overview of SAS
SAS - overview of SAS
 
Adbms
AdbmsAdbms
Adbms
 
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objectsConcepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
Concepts of NonStop SQL/MX: Part 2 - Introduction to catalogs and other objects
 
SAS Programming Notes
SAS Programming NotesSAS Programming Notes
SAS Programming Notes
 
153680 sqlinterview
153680  sqlinterview153680  sqlinterview
153680 sqlinterview
 
Sql project ..
Sql project ..Sql project ..
Sql project ..
 
Sql server lesson7
Sql server lesson7Sql server lesson7
Sql server lesson7
 
Understanding sas data step processing.
Understanding sas data step processing.Understanding sas data step processing.
Understanding sas data step processing.
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
 
Concepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to MetadataConcepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
Concepts of NonStop SQL/MX: Part 3 - Introduction to Metadata
 
Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.Concepts of NonStop SQL/MX: Part 4 - Storage.
Concepts of NonStop SQL/MX: Part 4 - Storage.
 

Semelhante a Sas summary guide

I need help with Applied Statistics and the SAS Programming Language.pdf
I need help with Applied Statistics and the SAS Programming Language.pdfI need help with Applied Statistics and the SAS Programming Language.pdf
I need help with Applied Statistics and the SAS Programming Language.pdfMadansilks
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8thotakoti
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureWake Tech BAS
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SASEdureka!
 
8323 Stats - Lesson 1 - 03 Introduction To Sas 2008
8323 Stats - Lesson 1 - 03 Introduction To Sas 20088323 Stats - Lesson 1 - 03 Introduction To Sas 2008
8323 Stats - Lesson 1 - 03 Introduction To Sas 2008untellectualism
 
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022Sprintzeal
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Languageguest2160992
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SASguest2160992
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2rowensCap
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sasAjay Ohri
 
Sas-training-in-mumbai
Sas-training-in-mumbaiSas-training-in-mumbai
Sas-training-in-mumbaiUnmesh Baile
 
Sas Talk To R Users Group
Sas Talk To R Users GroupSas Talk To R Users Group
Sas Talk To R Users Groupgeorgette1200
 
Introducción al Software Analítico SAS
Introducción al Software Analítico SASIntroducción al Software Analítico SAS
Introducción al Software Analítico SASJorge Rodríguez M.
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Barry DeCicco
 

Semelhante a Sas summary guide (20)

I need help with Applied Statistics and the SAS Programming Language.pdf
I need help with Applied Statistics and the SAS Programming Language.pdfI need help with Applied Statistics and the SAS Programming Language.pdf
I need help with Applied Statistics and the SAS Programming Language.pdf
 
SAS Commands
SAS CommandsSAS Commands
SAS Commands
 
Basics of SAS
Basics of SASBasics of SAS
Basics of SAS
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
 
8323 Stats - Lesson 1 - 03 Introduction To Sas 2008
8323 Stats - Lesson 1 - 03 Introduction To Sas 20088323 Stats - Lesson 1 - 03 Introduction To Sas 2008
8323 Stats - Lesson 1 - 03 Introduction To Sas 2008
 
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022
SAS INTERVIEW QUESTIONS AND ANSWERS IN 2022
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
SAS_Overview_Short.pptx
SAS_Overview_Short.pptxSAS_Overview_Short.pptx
SAS_Overview_Short.pptx
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SAS
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sas
 
Set, merge, and update
Set, merge, and updateSet, merge, and update
Set, merge, and update
 
Spss
SpssSpss
Spss
 
Sas-training-in-mumbai
Sas-training-in-mumbaiSas-training-in-mumbai
Sas-training-in-mumbai
 
Sas classes in mumbai
Sas classes in mumbaiSas classes in mumbai
Sas classes in mumbai
 
Sas Talk To R Users Group
Sas Talk To R Users GroupSas Talk To R Users Group
Sas Talk To R Users Group
 
Introducción al Software Analítico SAS
Introducción al Software Analítico SASIntroducción al Software Analítico SAS
Introducción al Software Analítico SAS
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Sas summary guide

  • 1. SAS Summary Guide School of Applied Statistics November, 03
  • 2. 1 Contents 1. Introduction........................................................................................................................2 1.1 Structure of a SAS Job .........................................................................................2 1.2 SAS Language......................................................................................................2 1.3 SAS Variables ......................................................................................................2 1.4 SAS Data Sets ......................................................................................................3 2. Introduction to the DATA Step .........................................................................................3 2.1 DATA Statement..................................................................................................3 2.2 Sources of Input ...................................................................................................3 2.3 Input of Raw Data ................................................................................................4 2.4 Formats: Input and Output ...................................................................................5 2.5 How SAS Executes a DATA Step .......................................................................5 2.6 Transformation of Data ........................................................................................5 2.7 Missing Values.....................................................................................................5 2.8 Modifying an Existing SAS Data Set ..................................................................6 2.9 Output from a SAS DATA Step...........................................................................6 2.10 Output to Create Stored ASCII Files .................................................................7 3. Introduction to the PROC Step ..........................................................................................7 4. Basic Procedures................................................................................................................8 5. More on the DATA Step....................................................................................................13 5.1 IF - THEN - ELSE Statements.............................................................................13 5.2 Selecting Observations.........................................................................................14 5.3 DO and END Statements .....................................................................................14 5.4 DO Loops .............................................................................................................14 5.5 Arrays...................................................................................................................15 5.6 RETAIN ...............................................................................................................15 5.7 DROP and KEEP .................................................................................................15 5.8 RENAME and LABEL ........................................................................................16 6. Data Management ..............................................................................................................16 6.1 SET.......................................................................................................................16 6.2 MERGE................................................................................................................17 6.3 UPDATE ..............................................................................................................17 7. Statistical Procedures.........................................................................................................18 8. Graphical Procedures .........................................................................................................21 9. Output Delivery System (ODS) .........................................................................................22 10. Further Facilities ..............................................................................................................23 11. Publications......................................................................................................................23 SAS Summary Guide November, 03 School of Applied Statistics
  • 3. 2 1. Introduction This handout is meant as a brief introduction to the syntax of the SAS package which is available on UNIX workstations and PC computers at The University of Reading. The SAS language is similar for all versions but there are differences in file access and storage. This document is designed to give a brief synopsis of many basic commands used in the Data step and the general structure to some statistical procedures (Proc). It is, by no means, complete and there are numerous specialised manuals published by SAS Institute (some of which are in Room G16 in the School of Applied Statistics). 1.1 Structure of a SAS Job A SAS program consists of a sequence of one or more steps and each step may contain several SAS statements. There are two kinds of step:- • The DATA step which is used to create and manipulate SAS data sets • The PROC step which is used for analysing or processing SAS data sets A SAS job is made up of any number of these steps. The beginning of one step signifies the ending of the previous step. 1.2 SAS Language SAS statements can begin in any column of a line and can be continued on subsequent lines. Each SAS statement must end with a semicolon but is mainly case-sensitive (i.e. upper and lower case should not be freely mixed). There are three types of SAS statements:- • Statements which appear in the DATA step • Statements which appear in the PROC step • Statements which can appear anywhere (global statements) Comments can also be included in a SAS program, these are useful for annotating your program. An asterisk is used to comment out a single statement. e.g. * This is a comment ; or to comment out a block of lines use the /* and */ delimiter pairs:- e.g. /* This is a comment which will not be acted upon by SAS */ 1.3 SAS Variables There are two types of SAS variable - numeric and character. They can have the following attributes:- LENGTH numeric variables 2 - 8 bytes character variables 1 - 200 bytes / characters INFORMAT format SAS uses to read a data value into a variable FORMAT format SAS uses to write each value of a variable LABEL descriptive label of up to 256 characters SAS Summary Guide November, 03 School of Applied Statistics
  • 4. 3 1.4 SAS Data Sets A SAS data set is a collection of data values arranged in a rectangular table, the rows representing observations and the columns representing variables. Each variable must be given a name which consists of 1 - 32 characters. The name must start with a letter and can contain any alphanumeric character or underscore. Avoid special characters in variable names such as . or $ . Special variables within SAS are denoted by names that begin and end with an underscore. SAS data sets can be either temporary or permanent. Temporary data sets are given a one- level name by the user which is automatically prefixed with WORK. by the SAS system. This name can be omitted altogether, in which case SAS names the data sets DATA1, DATA2 ... for the 1st, 2nd ... data sets defined. Temporary data sets are erased on leaving the current SAS session. Permanent data sets must be given a two-level name by the user linking to their storage location. e.g. LIBNAME PERM 'complete_pathname'; PROC PRINT DATA=PERM.STUDENTS; RUN; Permanent SAS data sets are stored differently between versions and allocated different file extensions. However, all data sets are upward compatible. There are several words which should not be used as the first part of the SAS data set name. These include such words as PRINT, EXEC, DATA etc. and also SAS reserved names such as LIBRARY, MAPS, WORK etc. SAS automatically documents a permanent data set to include a data set label, variable attributes and history information. The data are stored in the form in which SAS uses them, therefore saving computer time and making it unnecessary to execute input statements each time the data set is used. 2. Introduction to the DATA Step 2.1 DATA Statement The DATA statement signals the beginning of the DATA step and gives a name to the SAS data set being created. This SAS data set can be used as input to any subsequent DATA or PROC steps. e.g. a) DATA PERM.PATIENTS; creates a permanent data set b) DATA SCHOOL; creates a temporary data set c) DATA; creates a temporary data set with default name DATAn d) DATA _NULL_; does not create a data set 2.2 Sources of Input a) The DATALINES or CARDS statement is used when the data are in the same file as the SAS statements:- DATA REGRESS; INPUT X Y Z; SAS Summary Guide November, 03 School of Applied Statistics
  • 5. 4 DATALINES; 61 44 29 17 6 43 . . b) The INFILE statement is used to read data from an external file on your workdisk:- DATA REGRESS; INFILE 'file_identifier'; INPUT X Y Z; The file identifier in the INFILE statement is the full pathname and filename of the external data file, residing on your disk, which is to be linked to your SAS program. 2.3 Input of Raw Data The INPUT statement is used to describe the raw input data. There are three types of input mode which can be mixed in one INPUT statement:- • LIST (or free-field) • COLUMN • FORMATTED a) LIST INPUT This mode of input simply lists the variables in the order in which they appear in the input data e.g. INPUT NAME $ AGE SEX $; INPUT NAME $ Q1-Q32; where $ is used after a variable name to indicate a character variable whose value has a default length of 8 with no embedded blanks. Values must be separated by at least one space (free format). b) COLUMN INPUT With this mode of input the columns are specified within which each variable value is located e.g. INPUT CANNAME $ 1-15 PARTY $ 20-24 VOTES 30-40; The data values can be read in any order and blank fields are automatically set to missing. Embedded blanks are allowed in character data by specifying the maximum length of a value. c) FORMATTED INPUT This is a very flexible method of input as it is possible to read data in virtually any form. SAS keeps track of its position on the input lines with a 'pointer' e.g. INPUT @3 QUEST3 +10 QUEST12 / @60 RESPONSE; There are various types of 'pointer' controls each having a different meaning. Listed below are some of the more frequently used ones:- @n move pointer to column n SAS Summary Guide November, 03 School of Applied Statistics
  • 6. 5 +n move the pointer forward n columns #n move pointer to line n / move to next line Whichever mode of input is used the following 'pointer' controls can be used to maintain the current pointer position:- @ 'hold' data line for next INPUT statement in the current DATA step @@ 'hold' data line for more executions of the DATA step 2.4 Formats: Input and Output A set of directions for reading a value is called an INFORMAT and a set of directions for printing a value is called a FORMAT. It is possible to specify formats for numeric and character variables and also date and time variables. There are a large number of FORMAT and INFORMAT specifications, refer to SAS Language Reference Version 8 for further information. 2.5 How SAS Executes a DATA Step A DATA step is executed once for each observation in the data set. A DATA step that does not contain an INPUT, SET, MERGE or UPDATE statement is executed once. The SAS variable _N_ is automatically generated for each DATA step, its value is the number of times that SAS has begun executing the step (_N_ is not directly available outside the current DATA step). All variables referred to in the DATA step, for example the variables named in the input statement and any new variables generated, make up the program data vector. For each execution of the DATA step:- • The program data vector is initialised to missing. • The data values of the current observation are read using the INPUT statement. Any new variables are computed and added to the program data vector and any variables not wanted are dropped. • The values in the program data vector are then added to the data set being created 2.6 Transformation of Data There is a range of standard functions available in SAS for transforming data. For a full list of these functions consult the SAS Language Reference. Manipulation and transformation of data is carried out in the DATA step with the resulting variable being added to the data set automatically. e.g. SUM=X + X; X2=X * X; or X2=X**2; LX=LOG(X); 2.7 Missing Values Variables with missing values on input are specified in SAS by a full stop or a blank field. On output numeric variables are displayed as a full stop and character variables as a blank field. For numeric variables it is also possible to specify up to 27 special missing value symbols ( A - Z and _ ) to distinguish between different kinds of missing data. This is done using the MISSING statement:- SAS Summary Guide November, 03 School of Applied Statistics
  • 7. 6 DATA; INPUT X; MISSING A B; IF X = 99 THEN X = .A; IF X = 999 THEN X = .B; CARDS; a) .A is used to distinguish from the variable name A b) A variable is set to missing if the input field contains only a full stop or is blank. c) A variable is set to missing if the input field contains an illegal character 2.8 Modifying an Existing SAS Data Set Once data have been read into a SAS data set it is possible to modify that data in other DATA steps while keeping the original data set unchanged and without having to re-input the data from the raw data file. This is easily done by transferring data from the existing SAS data set into another one. e.g. DATA NEW; SET PERM.PATIENTS; DOSE=PILL_A*QTY_A; Each time the SET statement is executed another observation is transferred from the existing SAS data set PERM.PATIENTS to the SAS data set being created and called NEW . 2.9 Output from a SAS DATA Step OUTPUT statements allow you to control when an observation is written to one of the SAS data sets which are currently being created. e.g. OUTPUT; OUTPUT MISSDATA; When an OUTPUT statement is executed SAS will immediately output the current values to the named or current SAS data set. OUTPUT statements are useful for:- a) Creating 2 or more observations from 1 record of input data b) Combining several observations into one observation c) Creating more than one SAS data set from one input file eg. DATA HARV1 HARV2; SET COMPLETE; IF HARVEST=1 THEN OUTPUT HARV1; IF HARVEST=2 THEN OUTPUT HARV2; SAS Summary Guide November, 03 School of Applied Statistics
  • 8. 7 2.10 Output to Create Stored ASCII Files The FILE and PUT statements are used within a DATA step and are analogous to the INFILE and INPUT statements. The FILE command links SAS to a specific external file, while the PUT command specifies the output record format. e.g. DATA CREATE; SET CLASSNO; FILE 'file_identifier'; PUT NAME $ 1-8 SEX $ 11 AGE 13-14; 3. Introduction to the PROC Step Some of the procedures available in SAS are:- Basics: CHART, CONTENTS, CORR, DATASETS, FORMAT, FREQ, MEANS, PLOT, PRINT, SORT, SUMMARY, TABULATE, TRANSPOSE, UNIVARIATE Statistics: ANOVA, CANCORR, CANDISC, CLUSTER, DISCRIM, FACTOR, GLM, PRINCOMP, REG, TTEST Graph: GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, G3D, G3GRID SAS procedures analyse and process SAS data sets as follows:- a) Read SAS data sets b) Perform the requested task c) Print results d) Create SAS output data sets (optional) Most SAS procedures have default option settings for the more common situations or analyses. However, information can be given to the PROC step to specify:- a) Which data set to process b) Which variables to process c) Whether to process the data in subsets The PROC statement is used to begin a procedure. e.g. PROC MEANS DATA=PERM.PATIENTS MEAN STD; Some of the more commonly used statements within the PROC step are:- a) General statements common to many procedures VAR Specifies variables to be analysed ID Specifies a variable whose values identify observations in the SAS data set SAS Summary Guide November, 03 School of Applied Statistics
  • 9. 8 BY Specifies that the data set is to be processed in groups N.B. The data set must have already been sorted in the order of the current BY group. WEIGHT Specifies a variable whose values are the relative weights for the observations WHERE Subsets observations to be analysed based on specified criteria b) Statements specific to individual procedures TABLES Table request in PROC FREQ PLOT Plot request in PROC PLOT MODEL Model specification in PROC ANOVA, PROC GLM, PROC REG etc. c) Statements describing variable attributes FORMAT Specifies formats for printing variable values LABEL Associates descriptive labels with variable names Lists of names can be abbreviated:- a) Range of variables VAR SEX -- TEMP; b) Numeric suffix range VAR Q1 - Q20; c) Range of numeric variables only VAR AGE _NUMERIC_ TEMP; d) Range of character variables only VAR NAME _CHARACTER_ SEX; e) All numeric variables VAR _NUMERIC_; f) All character variables VAR _CHARACTER_; 4. Basic Procedures PROC CHART This procedure produces horizontal and vertical bar charts, pie charts, star charts and block charts for numeric and character variables. The charts can represent frequencies and cumulative frequencies, percentages and cumulative percentages, sums and means. PROC CHART DATA = data_set_name options ; HBAR variable_list ; produces horizontal bar chart VBAR variable_list ; produces vertical bar chart PIE variable_list ; produces pie chart STAR variable_list ; produces star chart BLOCK variable_list ; produces block chart BY variable_list ; SAS Summary Guide November, 03 School of Applied Statistics
  • 10. 9 PROC CORR This procedure computes correlation coefficients between variables. Various univariate statistics are also computed. PROC CORR DATA = data_set_name options ; VAR variable_list ; WITH variable_list ; WEIGHT variable ; FREQ variable ; BY variable_list ; PROC FORMAT This procedure is used to define formats for specifying labels for variable values used for output. Formats can be used for either numeric or character variables. They can be used in PUT statements in a DATA step and in FORMAT statements in a PROC step. In FORMAT statements in a DATA step they can also be used in which case they are then associated with the variable for the remainder of the SAS job, unless changed. PROC FORMAT options ; VALUE format_name value1 = label1 value2 = label2 . . valuen = labeln ; format_name Must be a unique SAS name which must begin with a $ for character variables values Can be a single number or a range of numbers, or several numerical or character values labels Labels can contain a maximum of 40 characters and must be enclosed in quotes e.g. PROC FORMAT; VALUE $SEXFMT 'M' = 'Male' 'F' = 'Female'; VALUE AGEFMT 1 - 16 = 'Child' 17 - High = 'Adult'; The formats defined above can be used in other procedures as follows:- PROC PRINT DATA = PERM.PATIENTS; SAS Summary Guide November, 03 School of Applied Statistics
  • 11. 10 VAR SEX AGE; FORMAT SEX $SEXFMT. AGE AGEFMT. ; NB. The full stop after SEXFMT and AGEFMT is essential PROC FREQ This procedure produces 1 - way to n - way frequency tables of character and numeric variables. PROC FREQ DATA = data_set_name options ; WEIGHT weighting_variable ; BY variable_list ; TABLES table_request / options ; In the TABLES specification the values of the last variable form the columns and the values of the second last variable form the rows. e.g. TABLES VAR1; one - way table TABLES VAR1 * VAR2; two - way table PROC MEANS This procedure is used to produce simple univariate statistics for numeric variables. The options available allow you to specify which statistics you want calculated e.g. mean, standard deviation, minimum. If no statistics are specifically requested in the MEANS statement, then variable name, N, mean, standard deviation, minimum, maximum are printed automatically. PROC MEANS DATA = data_set_name options ; BY variable_list ; VAR variable_list ; ID variable_list ; FREQ variable ; WEIGHT weighting_variable ; OUTPUT OUT = output_data_set_name statistics ; SAS Summary Guide November, 03 School of Applied Statistics
  • 12. 11 PROC PLOT This procedure produces line-printer plots for both numeric and character variables. Various options are available for specifying the plotting symbol, scaling the axes, drawing reference lines, superimposing 2 or more plots and drawing contour plots. PROC PLOT DATA = data_set_name options ; PLOT vertical_variable * horizontal_variable / options ; BY variable_list ; PROC PRINT This procedure prints the values in a SAS data set. PROC PRINT DATA = data_set_name options ; BY variable_list ; VAR variable_list ; ID variable_list ; PAGEBY variable ; SUM variable_list ; SUMBY variable ; PROC SORT This procedure rearranges the observations in an existing SAS data set or creates a new data set containing the rearranged observations. Multiple sorting groups can be specified and variables can be sorted in ascending or descending order. PROC SORT DATA = data_set_name OUT = output_data_set_name options ; BY variable_list ; Variables are automatically sorted in ascending order, for descending order put DESCENDING before the variable names in the BY statement. The SORT procedure should always be used when subsequent procedures process the data set in groups using the BY statement. It is possible to process a data set without sorting it beforehand by using the NOTSORTED option on the BY statement of the procedure being used. However, SAS assumes that consecutive observations with the same BY value are grouped together although the BY values are not necessarily sorted in alphabetic or numeric order. SAS Summary Guide November, 03 School of Applied Statistics
  • 13. 12 PROC SUMMARY This procedure produces a SAS data set containing statistics similar to the MEANS procedure, but much more efficiently. PROC SUMMARY does not produce any printed output and the data does not have to be sorted in order to produce subgroup statistics. An OUTPUT and a VAR statement must be specified, and any number of OUTPUT statements can be used. The VAR statement must precede the OUTPUT statements. PROC SUMMARY DATA = data_set_name options ; CLASS variable_list ; VAR variable_list ; BY variable_list ; FREQ variable ; WEIGHT weighting_variable ; ID variable_list ; OUTPUT OUT = output data_set_name statistics ; PROC TABULATE This procedure provides a more flexible alternative to the FREQ procedure for producing tables. Each cell in the table contains a descriptive statistic e.g. mean, standard deviation, etc. TABULATE will generate tables defined by the TABLE statement. Classification variables must be specified with the CLASS statement, while the variables to be tabulated i.e. whose values are to be the cell contents must be specified by the VAR statement. Each expression in the TABLE statement defines the categories for the table's dimensions - page, row and column. PROC TABULATE DATA = data_set_name options ; CLASS variable_list ; VAR variable_list ; BY variable_list ; FREQ variable ; WEIGHT weighting_variable ; FORMAT variables'_format ; LABEL variable = 'label' ; TABLE page_expression, row_expression, column_expression ; SAS Summary Guide November, 03 School of Applied Statistics
  • 14. 13 PROC TRANSPOSE This procedure transposes data sets, changing observations into variables and variables into observations. An output data set is created automatically and named according to the DATAn convention if a name is not specified. PROC TRANSPOSE DATA = data_set_name options ; VAR variable_list ; ID variable ; IDLABEL variable ; COPY variable_list ; BY variable_list ; 5. More on the DATA Step 5.1 IF - THEN - ELSE Statements These statements are used to execute a further SAS statement conditional on some expression. IF expression THEN statement; ELSE statement ; THEN statement is executed if expression is non zero, non missing or true ELSE statement is executed if expression is zero, missing or false There are eight relational operators:- LT or < LE or <= GT or > GE or >= NL or ~< NG or ~> EQ or = NE or ~= In addition there are three logical operators:- NOT or ~ AND or & OR e.g. DATA ; IF CODE = 1 OR CODE = 2 THEN SEX = 'MALE' ; ELSE SEX = 'FEMALE'; e.g. DATA ; INPUT AGE ; SAS Summary Guide November, 03 School of Applied Statistics
  • 15. 14 IF 0 < AGE < 10 THEN AGEGRP = 1 ; IF 10 <= AGE < 19 THEN AGEGRP = 2 ; IF AGE >= 19 THEN AGEGRP = 3 ; Any observations with values not included in one of the categories will produce missing or blank values. 5.2 Selecting Observations If not all observations are to be included in the data set being created they can be excluded by the DELETE statement or the subsetting IF statement. The DELETE statement stops the processing of an observation:- e.g. DATA MALES ; INPUT AGE SEX $ ; IF SEX = 'F' THEN DELETE ; The subsetting IF statement allows an observation to pass if the expression is true:- e.g. DATA MALES ; INPUT AGE SEX $ ; IF SEX = 'M' ; The result from both of the above DATA steps is the same. 5.3 DO and END Statements DO statements specify that any statements following the DO are to be executed until a matching END appears. e.g. DATA ; INPUT AGE SEX $ FAMILY $ ; IF SEX = 'F' THEN DO ; AGE = AGE - 5 ; FAMILY = 'NEW' ; END ; ELSE AGE = AGE + 3 ; 5.4 DO Loops DO loops allow a range of statements, within a DATA step, to be repeated either a specified number of times or while a specified condition holds. DO variable= start TO stop ; SAS Summary Guide November, 03 School of Applied Statistics
  • 16. 15 DO variable = start TO stop BY increment ; DO WHILE (expression) ; DO UNTIL (expression) ; DO OVER array_name ; Each must have a matching END statement to terminate execution. e.g. DO N = 1 TO 20 ; DO N = 1 TO 20 BY 4 ; DO WHILE (N < 20) ; DO UNTIL (N = 20) ; 5.5 Arrays Arrays in SAS are useful for processing a lot of SAS variables in the same way ARRAY array_name [index_variable] array_elements ; e.g. ARRAY A Q1 - Q5 ; DO OVER A ; A = LOG(A) ; END ; Array elements are substituted for the array name in SAS statements depending on the value of the index variable. SAS will use its own internal index variable if none is defined. In the example above the DO group is executed for every element in the array. 5.6 RETAIN This statement retains a variable value from the last execution of the DATA step. Normally all variables are set to missing before each execution of the DATA step. Initial values can also be assigned to the variables. RETAIN variable ; RETAIN variable initial_value ; 5.7 DROP and KEEP The DROP statement excludes named variables from a data set or analysis and the KEEP statement includes only named variables in a data set or analysis. Both statements can be used in the DATA step or as data set options which appear after the data set name on PROC steps. SAS Summary Guide November, 03 School of Applied Statistics
  • 17. 16 e.g. DATA PERM.PATIENTS ; DROP PATNO ; DATA PERM.PATIENTS(DROP = PATNO) ; PROC PRINT DATA = PERM.PATIENTS(KEEP = AGE SEX) ; 5.8 RENAME and LABEL The RENAME statement is used to rename variables. RENAME old_name = new_name ; The LABEL statement assigns labels of up to 40 characters to variables. LABEL variable = 'label' ; 6. Data Management 6.1 SET Reads observations from 1 or more SAS data sets and can interleave observations. a) Subset the observations DATA FEMALES ; SET STUDENTS ; IF SEX = 'F' ; b) Subset the variables DATA SMALL ; SET STUDENTS ; DROP WEIGHT AGE ; c) Add a new variable DATA ADD ; SET STUDENTS ; WTKG = WEIGHT / 2.2 ; d) Multiple output data sets DATA MALES FEMALES ; SET STUDENTS ; IF SEX = 'M' THEN OUTPUT MALES ; IF SEX = 'F' THEN OUTPUT FEMALES ; e) Multiple input data sets DATA ALL ; SAS Summary Guide November, 03 School of Applied Statistics
  • 18. 17 (Concatenate) SET MALES FEMALES ; f) Multiple input data sets DATA ALL ; (Interleave) SET MALES FEMALES ; BY NAME ; 6.2 MERGE Combines observations from two or more SAS data sets and places them side by side. a) One-to-one Merging If there are the same number of observations in each data set and if the observations are in the same order then they can be combined as shown below. The two data sets are placed side by side in the combined data set being created. DATA COUPLES ; MERGE HUSBANDS WIVES; For any duplicate variable name in the data sets, only the values of that variable from the last named data set will be saved. b) Match Merging The two data sets, having already been sorted, are placed side-by-side in the order specified in the BY statement. DATA STABLE ; MERGE HORSE TRAINER ; BY OWNER ; 6.3 UPDATE Updates a master file with a transaction file where the BY variable is the KEY for matching observations. DATA SURGERY; UPDATE SURGERY BLOODCT; BY PATIENT; This should be used only when, for a master data set, there are several changes that can be applied all in one job. SAS Summary Guide November, 03 School of Applied Statistics
  • 19. 18 7. Statistical Procedures There are a wide range of statistical procedures available in SAS for carrying out such techniques as analysis of variance and covariance, linear and non-linear regression analysis, multivariate methods and non-parametric methods. A few examples of some of the more widely used procedures are given below. For more details on all the procedures available for statistical analysis, consult the appropriate manuals. PROC ANOVA This procedure is used to carry out an analysis of variance of balanced data (see also PROC GLM). Many of the statements which can be used with this procedure are not necessary for standard analyses. PROC ANOVA DATA=data_set_name options ;   required statements; CLASS variable_list ;   must appear in this order MODEL dependent_variables = effects / options ;  BY variable_list ;   must appear before the ABSORB variable_list ;   first RUN statement FREQ variable ;  MEANS effects / options ;  can appear after the TEST H = effects E = effect ;  MODEL statement   MANOVA H = effects E = effect M = equations / options; and can be used REPEATED factor_names / options ;   interactively e.g. PROC ANOVA DATA = EXPT ; CLASS METHOD VARIETY ; MODEL YIELD = METHOD VARIETY METHOD * VARIETY ; BY YEAR ; SAS Summary Guide November, 03 School of Applied Statistics
  • 20. 19 PROC GLM This procedure can be used to fit general linear models to data to enable statistical methods such as analysis of variance, analysis of covariance, regression analysis (including comparison of regressions) and multivariate analysis of variance to be carried out. Unbalanced data and data with missing values can also be analysed using this procedure. There are numerous statements and options available with this procedure, but most applications only use a few of them. PROC GLM DATA=data_set_name options ;  must precede MODEL  CLASS variable_list ;  statement MODEL dependent_variables = independent_variables / options ; required statement ABSORB variable_list ;  BY variable_list ;   must appear before the  FREQ variable ;  ID variable_list ;  first RUN statement  WEIGHT weighting_variable ;   CONTRAST 'label' effect_values / options ;  ESTIMATE 'name' effect_values / options ;   LSMEANS effects / options ;   can appear after the MANOVA H = effects E = effect M = equations / options ;   MODEL statement MEANS effects / options ;   and can be used OUTPUT OUT = output_data_set_name;  interactively RANDOM effects / options ;  REPEATED factor_names / options ;   TEST H = effects E = effect / options ;   e.g. PROC GLM DATA = EXPT2 ; CLASS TREAT SUBJECT TIME ; MODEL RESP = TREAT SUBJECT(TREAT) TIME TREAT * TIME ; TEST H = TREAT E = SUBJECT(TREAT) ; LSMEANS TREAT TIME TREAT*TIME ; OUTPUT OUT = NEW P = RHAT R = RESID ; SAS Summary Guide November, 03 School of Applied Statistics
  • 21. 20 PROC TTEST This procedure carries out a simple t-test on the means of two groups of observations. The grouping factor specified by the CLASS statement it must have only two levels. PROC TTEST DATA = data_set_name options ;  required statements CLASS variable_list ;  BY variable_list ;   optional statements VAR variable_list ;  e.g. PROC TTEST DATA = EXPT5 ; CLASS SEX ; VAR SCORE ; PROC NLIN This procedure is used to fit nonlinear regression models. The model to be fitted has to be specified, as do the parameters to be estimated, initial guesses for them, and possibly the partial derivatives of the model with respect to each parameter. Some models are difficult to fit and in these cases the initial guesses can be critical. There is no guarantee that the procedure will be able to fit the model successfully. PROC NLIN DATA = data_set_name options ;  PARMS parameter = values ;  required statements MODEL dependent variable = expression ;   BOUNDS expressions ;  BY variable_list ;    ID variable_list ;  optional statements DER.parameter = expression ;   OUTPUT OUT = output_data_set_name ;   e.g. PROC NLIN DATA = EXPT3 ; PARMS B0 = 0.5 B1 = 0.08 ; MODEL Y = B0*(1-EXP(-B1*X)) ; DER.BO = 1-EXP(-B1*X) ; DER.B1 = B0*X*EXP(-B1*X) ; SAS Summary Guide November, 03 School of Applied Statistics
  • 22. 21 PROC REG This procedure is used to fit linear regression models. There are other regression procedures such as RSQUARE, RSREG and STEPWISE for selecting subsets of independent variables in a multiple regression analysis, fitting quadratic response surfaces and carrying out stepwise regression, respectively. PROC REG DATA = data_set_name options ; } required statement required statement for MODEL dependent_variables = independent_variables / options ;} model fitting: can be used interactively VAR variable_list ;  BY variable_list ;    must appear before the FREQ variable ;   first RUN statement WEIGHT weighting_variable;  ID variable ;   ADD variable_list;  DELETE variable_list;   MTEST equations ;   OUTPUT OUT = output_data_set_name ;  can appear anywhere after  PLOT y_variate*x_variate;  a MODEL statement and REFIT;  can be used interactively  RESTRICT equations ;  REWEIGHT condition;   TEST equations ;   e.g. PROC REG DATA = EXPT4 ; MODEL POP = YEAR ; OUTPUT OUT = REGOUT P = EPOP R = RESID ; 8. Graphical Procedures The majority of procedures available to produce high-quality, hard-copy graphical output work in the same way as those mentioned in section 4. Syntactically most are prefixed by the letter G e.g. GCHART, GPLOT etc. Additional global statements allow the user to specify more precisely the axes, symbols and patterns etc. used in the representation of the data. This is a topic beyond the scope of this Summary Guide but information can be found in the two volumes of the manuals SAS/GRAPH. To produce hard-copy, the various versions of SAS access the graphics devices in different ways, so refer to the appropriate SAS Companion Guide for more complete information. SAS Summary Guide November, 03 School of Applied Statistics
  • 23. 22 9. Output Delivery System (ODS) Many procedures produced output data sets which could be used in further calculations e.g parameter estimates from regression analysis. However, some more common procedures lacked this facility. Since verion 7 the Output Delivery System (ODS) has made the saving of datasets, formatted output for high-resolution printers and web quality output using HTML much simpler. Equally it is possible to control the output stream more effectively and greater choice of output objects to data sets is available. ODS is a vast topic with many individual statements. Each statement (shown in the next table has its own set of options which are not shown here and are best described in the manual. Table of ODS Statements ODS EXCLUDE {Specify output objects to exclude from ODS destinations. Open, manage, or close the HTML destination. If ODS HTML   the destination is open, you can create HTML output. ODS LISTING {Open, manage or close the Listing destination. Create a SAS data set from an output object and manage ODS OUTPUT   the selection and exclusion lists for the Output destination. Specify which locations to search for the definitions that  ODS PATH  were created by PROC TEMPLATE, as well as  the order in which to search for them.  Open, manage or close the Printer destination. If the ODS PRINTER  destination is open, you can create Printer output. ODS SELECT {Specify output objects for ODS destinations.  Write to the SAS log the specified selection or ODS SHOW  exclusion list.  Write to the SAS log a record of each output object that is ODS TRACE  created, or suppress the writing of this record. Print or suppress a warning that a style definition or a table ODS VERIFY  definition that is used is not supplied by SAS Institute. SAS Summary Guide November, 03 School of Applied Statistics
  • 24. 23 10. Further Facilities There are many more facilities in SAS in addition to those that have been documented here. These include:- • A macro processing language • A full-screen editor (FSP) enabling data to be entered and updated. It also contains a spreadsheet facility. • Interactive matrix language (IML). A very powerful module for programming matrix algebra useful for statistical and mathematical applications • Time series module (ETS) for carrying out econometric and time-series analysis. 11. Publications There is a vast range of SAS manuals for both UNIX and PC versions. They can be ordered from:- SAS Software Ltd. Wittington House Henley Road Medmenham Marlow SL7 2EB The Main Library on campus has a few manuals for reference based on previous versions. In addition, users of SAS at The University of Reading can read the current documentation on- line by registering at http://v8doc.sas.com/sashtml/ SAS Summary Guide November, 03 School of Applied Statistics