Assignment #2
CSC424/334 - Advanced Data Analysis
Due: 5/17/2004
Review the Principal Components Analysis method discussed in class (i.e. see Lecture #4 and Lecture #6). Given our discussion of this method, develop an IML module that can access a dataset of your choice (e.g. the "European Jobs" dataset below) from an external file, and accomplish the following:
European Jobs Dataset:
This dataset (Courtesy of the DASL archive) contains employment details for different industries in several European countries. The Principal Components Analysis method may be used to analyze this data. For example, insight into countries with similar employment patterns may be obtained.
Submission:
Submit hardcopy of your code, the test data, and the output produced. Do not submit your assignment by email.
Extra Credit (+10%):
Discuss the principal components obtained from your analysis. Identify
the top two principal components (i.e. the two that explain the
most variability). What interpretation, if any, can you give to these principal components.
Note: Before considering an interpretation you may want to consider
whether the covariance or correlation matrix should be used. If
you decide that the correlation matrix should be used then
redo the analysis using the correlation matrix. Also, you may want to review
the NFL-2000 PCA paper mentioned on the Lecture Notes page.