Examples of Reaction Time Measures


Methodological Issues


Problems for Analyzing Reaction Times


Identifying Outliers: The Problem

The problem is that it is hard to know which data points are outliers, as demonstrated in the following simulation.

Simulation of Reaction Times with Outliers

Setting things up in R

# load required libraries
library(ggplot2)
library(retimes)
## Reaction Time Analysis (version 0.1-2)
library(tidyverse)
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
# Simulate a population of "good" reaction times:
  # generate an ex-Gaussian population:
  rt_dist1 <- rexgauss(100000, 300, 100, 200, positive = F)
  # keep positive values only:
  rt_dist1 <- rt_dist1[rt_dist1 > 0]
  # give it a nicer name:
  Population_of_Good_Reaction_Times <- rt_dist1

What the distribution of RTs looks like without outliers:

# Mean and Histogram of "good" RT distribution  
mean(Population_of_Good_Reaction_Times)
## [1] 500.8441
hist(Population_of_Good_Reaction_Times, xlab = "RT in milliseconds")

Now create some outliers:

# Simulate a distribution of outliers
  rt_outliers <- rexgauss(1000, 450, 100, 600, positive = F)
  rt_outliers <- rt_outliers[rt_outliers > 0]
  # give it a nicer name:
  Population_of_Outliers <- rt_outliers

# Mean and Histogram of "outlier" distribution
mean(Population_of_Outliers)
## [1] 1044.028
hist(Population_of_Outliers, xlab = "RT in milliseconds")

Those two distributions look pretty different. But what does it look like when you have a sample (from an experiment) that contains a mixture of “real” RT responses and “outliers”?

Simulate a Sample of RTs with Outliers

Let’s see what a sample of 100 reaction times without outliers looks like:

# Set sample size 
sampleN <- 100
      
# Take a sample of "good" reaction times (without outliers)
Sample_Data <- sample(rt_dist1, sampleN) 
mean(Sample_Data)
## [1] 499.7751
hist(Sample_Data, xlab = "RT in milliseconds", main="Sample Data with no Outliers")

Now let’s see what that sample would look like if 10% of the RT responses were replaced by outliers:

# Create a sample of "outliers"    
  # set the proportion of the sample to replace with outliers
  proportion_outliers <- 0.1
  # calculate number of outliers to select
  nOutliers <- round(proportion_outliers * sampleN)
  # select the sample of outliers
  Outliers <- sample(rt_outliers, nOutliers)

# Replace part of the sample with outliers
  # remove the "good" data that will be replaced by outliers:
  GoodData <- sample(Sample_Data, length(Sample_Data) - nOutliers)
  s <- data.frame(RT = GoodData, Group = "GoodData")
  o <- data.frame(RT = Outliers, Group = "Outliers")
  Sample_Data_with_Outliers <- rbind (s, o)

# Mean and Histogram for Sample Data with Outliers
  mean(Sample_Data_with_Outliers$RT)
## [1] 519.8143
  hist(Sample_Data_with_Outliers$RT, xlab = "RT in milliseconds", main="Sample Data with Outliers")

It does look a little different than the sample without outliers - there are more data points in the right tail, and it extends further. Probably some of those data points in the tail should be excluded. But where would you make your cutoff?

Which points are outliers?

# Dot Plot of the individual data points. Which are outliers?
  dotPlot <- ggplot(Sample_Data_with_Outliers, aes(x=RT)) +
    geom_dotplot( stackdir = 'center')
  dotPlot
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Something we never know in real life

Here are the actual identies of the outliers in the simulation. One thing to notice is that the good and bad data often overlap - there may be no way to exclude all the outliers without also excluding some genuine data points. Another thing to keep in mind is that in real life we NEVER get to know which points are really outliers.

# Dot Plot of the individual data points with actual outliers shown
  dotPlot <- ggplot(Sample_Data_with_Outliers, aes(x=RT, fill=Group)) +
    geom_dotplot( stackdir = 'center')
  dotPlot
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Identifying Outliers: Strategies and Criteria


What to do with Outliers (once you identify them). Possible options:


What to do with Outliers: Best practices


Options for dealing with non-normality


References


Additional Resources (updated 2023)