1654 | Blaise Pascal is the first to develop the theory of probability. He is known for Pascal's Triangle, and was the first known person to apply the laws of probability to gambling. |
---|---|
1662 | John Graunt writes Natural and Political Observations upon the Bills of Mortality. He was the first demographer to use statistics to estimate the population of London. |
1718 | Abraham de Moivre writes Doctrine of Chances. He is the first to write down the formula for the normal histogram and the first to state the Central Limit Theorem, which states that if a random variable is a sum of many independent influences, then that random variable is approximately normally distributed. |
1722 | Roger Cotes published Opera Miscellanea. It is a study of the theory of errors, particularly in astronomy. |
1809 | Carl Friedrich Gauss discovers the Method of Least Squares, the standard method for fitting a regression line to data. |
1835 | Adolphe Quetelet publishes a study of human measurements that describes "the average man" that nature is trying to produce. He shows that human measurements tend to follow normal histograms. He is the first to introduce the body mass index that is still used today. |
1888 | Sir Francis Galton first introduced the concept of correlation, which describes the degree of linear association between two measurements, such as height and weight. |
1890 | R. A. Fisher is born. Many call him the father of modern statistics. He popularized the use of the standard deviation to estimate the spread of a dataset. |
1977 | John Tukey published the book Exploratory Data Analysis. It popularized many of the techniques we will study in this class. He also coined the terms "bit" and "software." |
Forecasting the 1936 Presidential Election
Statistical Experiments vs. Observational Studies
Gender Bias in Berkeley Grad School Admissions
Cynthia Perez, Tracking Disease in Puerto Rico
Tommy Wright, Sampling the Nation
Jaime Sam Mart'in, Research in Probability Theory
Jing Shyr, Statistical Software for Solving Business Problems
Marcey Abate, Testing Nuclear Weapons without Explosions
Iogn' Aido' Murcheartaigh, From p-values to President
Aarti Ssiram, Management Consulting
Rebecca Doerge, Finding Genes in Plants and People
Gregory Campbell, Making Sure Medical Devices are Safe and Effective
Dexter Wilson, Improving the Quality of a Bank's Services
Christopher Presley, Statistics for Science, and More
Steve Sun, Developing New Drugs
A multivariate dataset is a dataset containing two or more variables.
A nominal variable (also called a categorical variable) is a variable that contains non-numeric values, such as gender (female, male), is-smoker (yes, no), position (offense, defense), coin-condition (Poor, Fair/Good, F, VF, EF, UNC, BU).
A scale variable (also called a quantitative variable or continuous variable) is a variable that can contain any value in a continuous range. Examples are: age, height, weight, gpa, price.
An ordinal variable is a variable that contains discrete numeric values. Examples are: college-year (1, 2, 3, 4), survey-answer (1, 2, 3, 4, 5).