This page summarizes the SAS material required for this class.
Readings:
The following chapters refer to the text "Applied Statistics and the SAS Programming Language, Cody & Smith." You may want to review these chapters if you have the text.
The following chapters refer to the text "SAS System for Elementary Statistical Analysis, Schlotzhauer & Littell." You may want to review these chapters if you have the text.
The textbook readings above may not cover all the material required. The material below is a summary of textbook readings plus additional required topics. Remember to pay particular attention to the rules summarized in the Basic Rules section below. Here is a simple SAS program and its output. At a minimum, you should be very comfortable with programs of this complexity.
Review:
These are statements that do not belong to a particular DATA or PROC step. They have a global effect.
SAS carries out all statements in the DATA step in order for each input observation .
DATA <dataset_name>;
INFILE <file_name>;
DATA <dataset_name>; SET <dataset_name>;
DATALINES;
INPUT <variable_name [type] [position]>; Example: i) INPUT MFG @@; ii) INPUT MFG $ TYPE $ SEEK TRANSFER; iii) INPUT MFG $ 1-8 TYPE $ 11-12 SEEK 13-16 TRANSFER 17-19;
LABEL <variable_name='label'>...; Example: i) LABEL MFG='Manufacturer'; ii) LABEL MFG='Manufacturer' SEEK='Seek Time';
Symbol Operation Example
** Exponentiation Z=X**2; * Multiplication Z=X*Y; / Division Z=X/Y; + Addition Z=X+Y; - Subtraction Z=X-Y;Also, the following math functions are available:
EXP(<variable_name>)
LOG(<variable_name>)
SQRT(<variable_name>)
ABS(<variable_name>)
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15 THEN CLASS='FAST'; ELSE CLASS='SLOW'; * CREATE SQRT & LOG TRANSFORMS; LOGTRAN = LOG10(TRANSFER) ROOTTRAN= SQRT(TRANSFER); DATALINES; NEC 12X 7.3 105 SONY 6X 23.1 830 ... (more data lines)
DROP <variable_name>...;removes named variables from the dataset and keeps unnamed variables.
KEEP <variable_name>...;keeps named variables and drops unnamed variables from the dataset.
IF <expression> THEN <statement T>; ELSE <statement F>;Note: The ELSE statement is optional but, if present, must immediately follow the associated IF statement. The IF ... THEN part comprises a single statement:
i) IF SEEK < 15 THEN CLASS = 'FAST'; ELSE CLASS = 'SLOW'; ii) CLASS='SLOW'; IF SEEK < 15 THEN CLASS = 'FAST';
SAS comparison operators are shown below. You can use either the symbol or the two-letter abbreviation.
Symbol Abbrev
<, <= LT, LE >, >= GT, GE =, ^= EQ, NE
SAS also provides logical operators for more complex boolean expressions. Either the symbol or word may be used.
Symbol Connective
& AND | OR ^ NOT
A special form of the "IF" statement is used for subsetting a dataset, that is selecting/excluding particular observations.
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15;
The statement IF SEEK < 15; is equivalent to:
i) IF SEEK < 15 THEN OUTPUT; ii) IF SEEK >=15 THEN DELETE;The example below excludes those observations where seek is 15 or more:
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15; DATALINES; NEC 12X 7.3 105 SONY 6X 23.1 830 SONY 4X 40.1 330 CANON 6X 13.5 530 SONY 12X 5.5 1000The dataset CDROM will therefore contain observations 1, 4 and 5 only:
NEC 12X 7.3 105 CANON 6X 13.5 530 SONY 12X 5.5 1000
i) * ... ; ii) /* ... */ DATA CDROM; * Read in variables; INPUT MFG $ TYPE $ TRANSFER SEEK; /* ignore next statement SEEKMIN = SEEK/60000; */
SAS procedures execute predefined procedures which may be either statistical or utility procedures. The data structure processed is the most recently created dataset unless otherwise specified in a "DATA=" option.
PROC <procedure_name> [option]; [procedure_statement];
DATA=<dataset_name>
VAR <variable_name>;
BY <variable_name>;
ID <variable_name>;
LABEL <variable_name='label'>;
WHERE <expression>;
You should be familiar with the following procedures:
PROC CORR [options]; [VAR <variable_name>; WITH <variable_name>;]
PROC MEANS [options]; [VAR <variable_name>...;]
PROC UNIVARIATE [options]; [VAR <variable_name>...;]
PROC PRINT [options]; [VAR <variable_name>...;]
PROC SORT [options]; BY <variable_name>...;
PROC PLOT [options]; PLOT <dep_var_name>*<indep_var_name>='*' [options];
PROC REG [options]; MODEL <dep_var_name>=<indep_var_name>;If residuals are needed then the optional output statement must be included. The dataset specified in the out= option will be created and will include the contents of the original dataset processed in addition to the variable specified in the r= option which will include the residuals.
PROC REG [options]; MODEL <dep_var_name>=<indep_var_name> [/ option]; OUPUT OUT=<dataset> r=<var_name>;See guide 5 (output) for an example. Also, see this income/gpa sample program.