Syllabus
CSC 423/324 - Data Analysis and Regression
Joseph Morgan
Office: CS&T 210
Phone: (312)-362-6321
Fax: (312)-362-6116
Email: jmorgan@cs.depaul.edu
Texts:
- An Introduction to Statistical Methods and Data Analysis. Ott. Duxbury Press, 4th edition,
1993.
- SAS System for Elementary Statistical Analysis. Schlotzhauer &
Littell. SAS Institute, 2nd edition, 1991.
- Computer Science Data Analysis Using SAS. Knafl (unpublished
notes). DePaul bookstore, June, 1997.
Topics:
- Inferences about a population mean, about a difference in population means, using paired
or independent samples, and about normality.
- Introduction to analysis of variance. Multiple comparisons.
- Multiple regression. Residual plots and regression diagnostics. Model selection using PRESS
statistic.
- Matrix notation and the general linear model.
The primary emphasis of the course will be on how to perform selected data analyses using the
Statistical Analysis System (SAS). SAS PROCs to be covered include ANOVA,
CORR, FREQ, GLM, IML, MEANS, NPAR1WAY, PLOT, PRINT, RANK, REG,
SORT, STEPWISE, TTEST, and UNIVARIATE. Statistical and mathematical
issues will be covered that assist the student in understanding the models underlying the
statistical PROCs and in interpreting the output of those PROCs.
Grading:
Students grades will be based on the total points earned. Points will be assigned to homework
problems, programming assignments, and examinations.
The grade breakdown will be as follows:
- Homework Problems: 15%
- Programming Assignments: 40%
- Final: 45%
Cutoffs for grades will be no higher than the following: 90-100, A; 80-89.99,
B; 70-79.99, C; 60-69.99, D; 0-59.99, F. Pluses and minuses will be given at
the high/low ends of each grade range (no A+'s or D-'s).
Assignments:
Homework problems will either be assigned from the text or will be made
up problems.
Programming assignments will involve analysis of a supplied dataset and will
require a written analysis of the results using a word processing system. Programs should use
meaningful names, titles, comments, and indentation. Analyses
must be written in precisely stated statistical terms and in correct English. You
should also use a word processor that supports mathematical notation and Greek symbols and
you should integrate appropriate output from your SAS programs into your documents in a
presentable format.
Assignments will be assessed penalties for late submission. Solutions to homework problems may be submitted by email or may
be turned in to me in class or in my office, put into my mailbox, or faxed to me.
All assignments must be submitted by the due date to avoid late penalties.
Plagiarism/Cheating:
Incompletes:
Readings:
- The first set of topics to be covered in CSC 423/324 is related to the following sections of the
texts.
- Ott:
- Chapters 1-7. Skip Sections 3.5, 4.4-4.8, 4.13, 5.3-5.4, 5.6, 6.4, 6.7 and 7.3. You should have covered much
of this material, but perhaps it was presented differently. Read this material to gain a clearer
understanding of the topics we are covering. The primary issues relevant to the class are those
related to inference or hypothesis testing,
and sampling distributions.
- Relevant topics include descriptive statistics, one sample and two sample inference problems
(both parametric and nonparametric approaches).
- Schlotzhauer & Littel:
- Chapters 1-7, Appendices 1-2, 4.
- The second (analysis of variance) and third (regression and correlation) topics to be covered are
related to the following sections of the texts.
- Ott:
- Chapters 9-15. Skip Sections 10.7, 12.6, 15.4, and 15.6. Multiple regression and analysis of
variance topics will be covered. As much of the material in these Chapters will be covered as
time allows.
- Schlotzhauer & Littel: