Assignment #1
CSC424/334 - Advanced Data Analysis
Due: 4/26/2004
Consider the matrix algebra representation of the general linear regression
model discussed in class (i.e. see
Lecture #2 and
Lecture #3).
Using this representation,
develop a module that is able to access a dataset of
n p-tuples
and accomplish the following:
Note: You may assume that p=10 (i.e. one dependent
variable and nine independent variables).
Your module should essentially be able to generate the extended version of the report produced by the SAS reg procedure (with the exception of p-values) when a full regression model is specified.
Note that you are not restricted to using SAS for
implementation of
this module
You have two broad options.
Either use the IML procedure available in SAS,
or use your choice of a 3rd
generation programming language (i.e.
you may use Java, C++ or any other 3rd generation
programming language).
Note: If you choose not to use IML then you must
confirm with me, before beginning your implementation,
that your choice of
a 3rd generation language is appropriate.
Refer to the example discussed in class when you are ready to test your module. The code discussed in class is available. Note that the code has embedded data.
Submission:
Submit hardcopy of your code, the test data, and the output produced. Do not submit your assignment by email.
Extra Credit (+10%):
Develop your module so that it can access an external dataset containing n p-tuples where p is an arbitrary value. You may assume that prior to execution your program may be edited in one (but only one) place. Also, assume that the dataset is an ascii file and that each observation is on a separate physical line (i.e. terminated by cr/lf). To further simplify, assume that the values for each variable are delimited by a space and the single response variable occupies the first position on each physical line.