This page summarizes the SAS material required for this class. It also indicates the material required for the SAS Quiz (i.e. Quiz #3 - except for Summer Sessions).
Readings:
The following chapters refer to the text "SAS Manual for Moore and McCabe's Introduction to the Practice of Statistics, Evans." You may want to review these chapters if you have the text.
The following chapters refer to the text "SAS System for Elementary
Statistical Analysis, Schlotzhauer & Littell." You may want to review these
chapters if you have the text.
Note: SAS Handouts #2 and #3 are from these chapters.
The textbook readings above may not cover all the material required. The material below is a summary of the textbook readings plus additional required topics. Remember to pay particular attention to the rules summarized in the Basic Rules section below.
Note: For the SAS Quiz, you should expect three questions. The topics for each question are:
Review:
These are statements that do not belong to a particular DATA or PROC step. They have a global effect.
SAS carries out all statements in the DATA step in order for each input observation .
DATA <dataset_name>;
INFILE <file_name>;
DATALINES;
INPUT <variable_name [type] [position]>...; Example: i) INPUT MFG @@; ii) INPUT MFG $ TYPE $ SEEK TRANSFER; iii) INPUT MFG $ 1-8 TYPE $ 11-12 SEEK 13-16 TRANSFER 17-19;
LABEL <variable_name='label'>...; Example: i) LABEL MFG='Manufacturer'; ii) LABEL MFG='Manufacturer' SEEK='Seek Time';
Symbol Operation Example
** Exponentiation Z=X**2; * Multiplication Z=X*Y; / Division Z=X/Y; + Addition Z=X+Y; - Subtraction Z=X-Y;Note: Parentheses may be used to override the operator precedence hierarchy. So, any part of an expression within parentheses will be evaluated first. Remember that the exponenentiation operator is at the highest level of the hierarchy, then multiplications and divisions, and then additions and subtractions.
DROP <variable_name>...;removes named variables from the dataset and keeps unnamed variables.
KEEP <variable_name>...;keeps named variables and drops unnamed variables from the dataset.
IF <expression> THEN <statement T>; ELSE <statement F>;Note: The ELSE statement is optional but, if present, must immediately follow the associated IF statement. The IF ... THEN part comprises a single statement:
i) IF SEEK < 15 THEN CLASS = 'FAST'; ELSE CLASS = 'SLOW'; ii) CLASS='SLOW'; IF SEEK < 15 THEN CLASS = 'FAST';
SAS comparison operators are shown below. You can use either the symbol or the two-letter abbreviation.
Symbol Abbrev
<, <= LT, LE >, >= GT, GE =, ^= EQ, NE
SAS also provides logical operators for more complex boolean expressions. Either the symbol or word may be used.
Symbol Connective
& AND | OR ^ NOT
A special form of the "IF" statement is used for subsetting a dataset, that is selecting/excluding particular observations.
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15;
The statement IF SEEK < 15; is equivalent to:
i) IF SEEK < 15 THEN OUTPUT; ii) IF SEEK >=15 THEN DELETE;The example below excludes those observations where seek is 15 or more:
DATA CDROM; INPUT MFG $ TYPE $ SEEK TRANSFER; IF SEEK < 15; DATALINES; NEC 12X 7.3 105 SONY 6X 23.1 830 SONY 4X 40.1 330 CANON 6X 13.5 530 SONY 12X 5.5 1000The dataset CDROM will therefore contain observations 1, 4 and 5 only:
NEC 12X 7.3 105 CANON 6X 13.5 530 SONY 12X 5.5 1000
i) * ... ; ii) /* ... */ DATA CDROM; * Read in variables; INPUT MFG $ TYPE $ TRANSFER SEEK; /* ignore next statement SEEKMIN = SEEK/60000; */
SAS procedures execute predefined procedures which may be either statistical or utility procedures. The data structure processed is the most recently created dataset unless otherwise specified in a "DATA=" option.
PROC <procedure_name>; [procedure_statement];
VAR <variable_name>;
BY <variable_name>;
You should be familiar with the following procedures:
PROC CORR [options]; [VAR <variable_name>;]
PROC MEANS [options]; [VAR <variable_name>...;]
PROC UNIVARIATE [options]; [VAR <variable_name>...;]
PROC PRINT [options]; [VAR <variable_name>...;]
PROC SORT [options]; BY <variable_name>...;
PROC PLOT [options]; PLOT <dep_var_name>*<indep_var_name>='*' [options];
PROC REG [options]; MODEL <dep_var_name>=<indep_var_name>;If residuals are needed then the optional output statement must be included. The dataset specified in the out= option will be created and will include the contents of the original dataset processed in addition to the variable specified in the r= option which will include the residuals.
PROC REG [options]; MODEL <dep_var_name>=<indep_var_name>; OUPUT OUT=<dataset> r=<var_name>;See guide 5 (output) for an example. Also, see the income/gpa sample program.