Taken at face value, some of the evidence from controlled experiments is
conclusive. But we have to allow for fraud and the 'file-drawer' effect.
Take the first of these. Even many 'normal' scientists have cheated, as
recorded by Alexander Kohn in "False Prophets: Fraud and Error in Science
and Medicine" (Barnes and Noble 1986). To that collection may be added
the psychologists who lie to their subjects and call the lying 'experimental
dissimulation'.
Parapsychologuists and psychics have more incentive to cheat because, if
their research results are uninteresting, they have less opportunity to
turn to teaching. Unconscious cheating, wishful thinking (which is universal),
unsound experimental design and analysis, and seeing what we expect are
further pitfalls. The statistician M. G. Kendall once described the phenomenon
of seeing what one expects as "one of the deadliest forms of bias in
psychology". He was referring to an experiment in which an observer
of a reliable random number generator had a tendency to write down too many
even numbers.
Potentially the most important evidence in Radin's book is the chapter on
meta-analysis, which is also emphasized in the introduction--and it is here
that the 'file drawer' effect comes into play.
Meta-analysis is the combination of results from many experiments. A problem
in meta-analysis, and in statistics generally, is how to allow for the researches
that remain unpublished and unknown because their P values did not reach
a conventional significance level such as 0.05. I do not know who coined
the name 'file drawer' effect for this problem. This effect drags down the
statistical significance of published work. Radin claims that "parapsychologists
were among the first to become sensitive to this problem"-=-although
he does not say when--and he mentions that "in 1975 the Parapsychological
Association's officers adopted a policy opposing the selective reporting
of positive outcomes". The problem was known to statisticians by 1958.
Consider the following typical example. Radin points out that there were
186 publications on ESP card tests worldwide from 1882 to 1939. "The
combined results of this four-million trial database [taken at face value],"
he says, "translate into tremendous odds against chance--more than
a billion trillion to one." (A 'trial' is the guess of one card) He
means that the P value is about 10^-21--he is not writing only for the scientific
establishment. This P value corresponds to a bulge above 'chance' expectation
of 9.5 sigma, where sigma is the standard deviation. (I call that a 'sigmage'
of 9.5.)
Apart from the possibility of conscious and unconscious fraud and wishful
thinking in some fraction of the publications, Radin claims, with no explanation,
that, in order to "nullify" the statistical significance, the
file drawer would have to contain "more than 3,300 unpublished, unsuccessful
reports for each published report". That number 3,300 is a gross overestimate.
It should be reduced at least to about 15 (or even to 8).
The expected sigmage in the file drawer, under the null hypothesis, would
be slightly negative but I will call it zero. If these results were combined
with the published work, the total sample size would be multiplied by 16,
thus becoming 64 million individual guesses. Given the null hypothesis ('chance'),
the bulge would be unaffected so the sigmage would be divided by sqrt (16
) = 4 and would become 9.5/4 = 2.4 with a P value of about 1/100.
Because the number of individual guesses is so large, this P value appreciably
SUPPORTS the null hypothesis (no ESP). This is because a Bayes factor (the
factor by which the odds of a hypothesis are multiplied in light of the
observations), corresponding to a fixed P value, is roughly proportional
to 1/ sqrt (N), where N is the sample size. So Radin's method for evaluating
the file-drawer effect, whatever that method may be, must be misguided.
This conclusion largely undermines Radin's meta-analysis which is central
to his case for ESP.
Nevertheless, Radin's book is well written and provides a good summary of
the arguments supporting the existence of ESP, with about 600 references.
It is less good on the counter-arguments. Readers should also consult "ESP
and Parapsychology: A Critical Evaluation by C. E. M. Hansel (Buffalo 1980)
where much fraudulent work is exposed. Radin quotes Hansel as saying that
three P values, each of 0.01, amount to one of 10^-6, and that he (Radin)
would find that convincing. But the product of independent P values is not
a P value. The product has to be transformed by a method due to R. A. Fisher.
Both Hansel and Radin have overlooked this. In the present example, the
composite P value is 1/9000 not 1/1,000,000.
I am not a skeptic by definition. There is one type of evidence that could
convince me if it were successful. Guesses, by psychics, of the parities
(even or odd) of future cricket scores could be published on the World Wide
Web. The actual scores and parities could be published (later) in large
print to help the precognizing of the psychics and to help their evaluation.
This procedure would rule out the possibiity of undetectable fraud.