# «[See copyright notice at the end of this article] Journal of Parapsychology, 78(2), 170-182, 2014 BAYESIAN AND CLASSICAL HYPOTHESIS TESTING: ...»

Other Methodology Articles

[See copyright notice at the end of this article]

Journal of Parapsychology, 78(2), 170-182, 2014

## BAYESIAN AND CLASSICAL HYPOTHESIS TESTING:

## PRACTICAL DIFFERENCES FOR A CONTROVERSIAL

## AREA OF RESEARCH

By J. E. Kennedy

ABSTRACT: The use of Bayesian analysis and debates involving Bayesian analysis are increasing for controversial areas of research such as parapsychology. This paper conceptually describes the philosophical and modeling differences between Bayesian and classical analyses, and the practical implications of these differences. Widely accepted statistical conventions have not yet been established for Bayesian analysis in scientific research. The recommendations from the FDA guidance on using Bayesian methods are appropriate for confirmatory experiments. This guidance recommends that the study design and protocol include (a) specification of the prior probabilities and models that will be used, (b) specification of the criteria that will be considered acceptable evidence, (c) operating characteristics for the probability of Type I error and power of the analysis, and (d) an estimate of the relative roles of prior probability versus the data from the current experiment in producing the final results. Both classical and Bayesian methods are valid when properly applied with confirmatory methodology that includes prespecification of statistical methods, and prospective evaluations of inferential errors and power. Evaluations of inferential errors and power measure the validity of a planned hypothesis test, including Bayesian analysis. Unfortunately, the use of confirmatory methodology has been rare in psychology and parapsychology.

Keywords: Bayesian analysis, classical analysis, inferential errors, confirmatory research, subjective probability The use of Bayesian analysis has been rapidly increasing in science and is becoming conspicuous in scientific controversies. For example, Wagenmakers, Wetzels, Borsboom, and van der Maas (2011) argued that classical analyses supporting parapsychological effects are evidence that classical methods are faulty and should be replaced with Bayesian methods. Bem, Utts, and Johnson (2011) responded that certain aspects of this analysis were flawed, but agreed that Bayesian methods have advantages that will be increasingly utilized in scientific research. Debates like this typically focus on specialized technical points without presenting the fundamental assumptions and models that provide the crucial context for understanding and evaluating the arguments.

The present article is intended to describe conceptually the philosophical assumptions, models, and practical aspects that differ between Bayesian and classical hypothesis testing. This discussion should allow a person to conceptually understand the descriptions of methodology and the findings for experimental research that uses Bayesian analyses, and to follow debates about conflicting conclusions from research data. In addition, some potentially controversial claims and practices with Bayesian methods are described, as well as recommendations for methodology for confirmatory experiments. References are not provided for concepts that are commonly described in writings on Bayesian methods.

The discussion here focuses on evaluating the evidence for an ESP or psi experimental effect using a binomial analysis, as is common in parapsychology. Bayesian methods can also be used for other types of analyses. The basic principles discussed here also apply for other analyses.

When discussing current limitations, uncertainties, or debates about a statistical topic, I sometimes offer my opinion about the optimal strategy for handling the matter. Some of these opinions are prefaced with qualifiers such as “in my opinion” or “my perspective is.” These qualifiers are intended to indicate that a detailed technical discussion of the topic is beyond the purposes of the present article, and that others may have differing opinions.

Bayesian and Classical Hypothesis Testing: Practical Differences 171 Is Probability a Condition of the Physical World or a Condition of a Human Mind?

Bayesian and classical analyses are based on different philosophical perspectives about the nature of probability. Consider the case of a colleague who goes into a separate room and flips a coin. After the coin has been flipped, the colleague knows the outcome, but a person in the other room does not.

Objective Probability One perspective is that after the coin has been flipped there is no uncertainty about the outcome. It is what it is. If the coin came up heads, the probability that it is heads is one, and the probability that it is tails is zero. The fact that a person in another room does not know the state of the coin is irrelevant. Probability in this case is objectively based on the state of the physical world. It is not an accurate representation to describe the state of the coin as being uncertain after the state has been physically determined.

Classical hypothesis testing is based on this philosophy of probability. A scientific hypothesis such as “do some people have psychic abilities” is a question about the existing state of the world. The world is what it is, and the fact that a particular person is uncertain about the truth of a hypothesis does not affect the existing state of the world. Variation and probability pertain to the outcomes of future experiments and observations, not to the properties of an existing state of the world. This perspective on probability is also called the frequentist interpretation because it assumes probability is based on the frequency of occurrence of an outcome when the random event or observation is repeated many times.

The logic for statistical analysis is to determine the probability for the outcome of an experiment given that a certain state of the world exists. The statistical models treat the parameters for the state of the world as constant and the outcome of an experiment as variable.

Subjective Probability An alternative philosophical perspective is that probability is based on the beliefs in a human mind and therefore is subjective. The fact that the state of the coin has been determined does not resolve the uncertainty for a person who does not know the outcome. Uncertainty and probability exist for that person. The probability for a person in the room with the coin is completely different than for a person in another room.

Bayesian statistics are based on subjective probability and prescribe how a person’s beliefs should change as new data are obtained. A mathematical model determines the optimal beliefs given the initial beliefs and the new data.

This strategy assumes that the uncertainty in a person’s mind can be quantitatively modeled and that a person’s beliefs should rationally follow the mathematical laws of probability theory.

A person’s initial beliefs and uncertainty are mathematically represented with prior probability distributions.

These represent the person’s beliefs prior to collecting data for the current study. Ideally, everything that a person believes about a topic is quantitatively incorporated into the prior probabilities. For example, any concerns about misconduct or biased methodology in previous studies must be incorporated quantitatively into the prior probability values.

After the data have been collected for the current study, the analysis combines or updates the prior probabilities with the evidence from the new data to produce the posterior probability. This mathematically represents what the person should rationally believe given the prior beliefs and the data from the current study.

In Bayesian models, parameters representing the existing state of the world are treated as variable and the observed outcome of an experiment is treated as constant. The variability in the parameters for the existing state of the world represents the uncertainty in a person’s mind, not variation or fluctuations in the actual state of the world. Cases with variations in the state of the world, such as in a random effects analysis, are a different aspect of variability in the model.

Different Uses of Probability Both philosophical perspectives on probability appear to me to be valid. They focus on different manifestations of probability. Objective probability attempts to directly model uncertainty in the physical world whereas 172 The Journal of Parapsychology subjective probability attempts to directly model uncertainty in a human mind. Both approaches involve algorithms for drawing inferences about the world. Both quantify uncertainty using mathematical probability distributions of hypothetical possibilities for the value of terms in mathematical models. Both assume that the mathematical models can be verified and improved by making observations.

The key question is how useful the two approaches are in practice. I tend to favor one or the other, depending on the context for the use of probability.

In situations such as gambling games in casinos, the probabilities are precisely known. All possible outcomes of a series of random events can be fully enumerated. The probability that a certain outcome will occur on a series of trials is clear, and the concept of probability based on many repetitions seems natural. Many parapsychology experiments also have these well-defined properties.

The other extreme would be situations such as commodity markets and other investment decisions. In these cases, the probabilities are not precisely known and are not constant over time.

Another important factor is the type of question being asked. A question such as “do some people have psychic abilities” is about an existing state of the world. On the other hand, a question such as “should I invest my retirement savings in the commodities market based on the predictions of a psychic” is a personal decision about future actions more than a scientific question about the state of the world. These latter situations usually involve potential risks and rewards, and are more difficult to conceptualize in terms of repeated observations.

For me, when a question focuses on an existing state of the physical world and can be evaluated with repeated observations using clearly applicable probability models, the methods of objective probability are a natural fit. When a question focuses on a personal decision that involves risks and rewards or poorly defined probabilities, subjective probabilities are a natural fit. Note that the operation of a market is based on the assumption that people have different subjective probabilities about the outcomes of future events. If everyone had the same beliefs, commodity markets and stock markets would not be possible because there would be only buyers or only sellers.

**Scientific Research**

Scientific researchers have traditionally taken great pride in being objective. They have modeled the basic properties of the world as being independent of the human mind. The philosophy of objective probability emerged from and is consistent with that worldview.

Subjective probability brings the diversity of a market environment to scientific research, and complicates analyses by including models of the personal beliefs in a human mind as well as models of the external world.

Advocates of Bayesian methods, of course, argue that pure objectivity does not occur and that subjective probability is more realistic of what actually happens in science. However, another perspective is that a prominent injection of subjectivity into scientific methods will unnecessarily further degrade the admittedly imperfect objectivity of science and hinder the development of consensus.

These debates have no clear resolution at present. Both approaches have assumptions about how human beliefs should ideally be influenced by evidence. From my perspective, the claims that one approach is better than the other need to be evaluated empirically—and that remains to be done. Most scientific research, and particularly experiments, can be reasonably evaluated with either approach.

A more pragmatic question is what are the differences between these two approaches in practice? It appears to me that both approaches are logically valid and should eventually reach the same conclusions for scientific hypotheses about an existing state of the world. Ease of use and efficiency in reaching those conclusions may differ. Of course, classical methods currently have advantages from much more widely available software and more extensive practical experience with the methods and software. In addition, classical methods have widely accepted conventions for statistical methodology and simpler mathematical methods.

Classical hypothesis tests evaluate an experiment by comparing the observed outcome to the distribution of other outcomes that could have occurred if the results were produced by chance fluctuations. If the probability Bayesian and Classical Hypothesis Testing: Practical Differences 173 or p value of the observed outcome under this null hypothesis is less than a prespecified criterion, the outcome is interpreted as a significant result that provides evidence for the alternative or experimental hypothesis.

A Bayesian analysis typically compares the probability that the alternative or experimental hypothesis is true with the probability that the null hypothesis is true, given the prior probabilities and the experimental data. The null hypothesis is that only chance is operating. The comparison is made by forming the ratio of the two probabilities. This ratio is the odds that the alternative hypothesis is true. Larger values of the odds are favorable for the alternative hypothesis.

For an estimate of the effect size with 95% confidence, a classical frequentist analysis describes the confidence interval as having a.95 probability that the range contains the true value. A Bayesian analysis describes the credible interval as having a.95 probability that the true value lies within this range. Obviously, few statistical users will consider the theoretical distinction between these descriptions to be important in practice.

** Differences in Calculating and Interpreting Probability**