Friday, April 16, 2010

What is your Bayesian prior on non-human animal emotion?

Although humans have a strong ability to communicate with other humans, we lack the ability to communicate as well with non-human animals. As a result, we know far more about the internal emotional states of other humans than we know about the internal emotional states of non-human animals. Where do we begin to make inferences about the internal states of non-human animals? What is the appropriate Bayesian prior for making inferences about the unknown internal states of non-human animals?

A long time ago, Rene Descartes (1649) posited that non-human animals are akin to machines (have no soul or mind and couldn't feel pain). Since then, it has been considered "scientific" to assume non-human animals actually are machines unless there is overwhelming evidence to the contrary. I propose that this is highly unscientific, and that it is an example of argumentum ad ignorantiam, that the absence of evidence is the evidence of absence. That is, Descartes, and millions since then, have found it convenient to presume that an absence of evidence about non-human animal emotion constitutes evidence of absence of non-human animal emotion.

Another approach to understanding non-human animal emotions, universally taken by infants, is to assume that "all others are like me." Indeed, this is how humans learn about each other. In the absence of a lot of knowledge, we assume that other humans (especially those that look like us) think and feel as we do. This is our Bayesian prior for interactions with other humans. Like Descartes's proclamation that non-humans are machines, this could also constitute a prior for understanding non-human animal emotion.

Is emotion likely to be a synapomorphy shared with a wide range of taxa? In my simplistic way of thinking, I propose two lines of reasoning that suggest it is. First, emotion in humans appears to be regulated in a primitive part of the central nervous system, whose structure is shared with a wide variety of taxa. If it looks like beer, smells like beer, tastes like beer, it might be beer. Second, emotions are useful. Fear and pain are widely recognized for their fitness benefit, for their adaptive value. I suggest that other emotions are likely to have similar fitness benefit. If a behavior generates joy or euphoria or happiness, organisms would be inclined to continue that behavior. Emotions could help provide mechanistic links between biochemistry and behavior.

Bayesian inference is useful in that it requires we think hard and think carefully about our prior beliefs. In that sense, it helps us become more scientific and maybe even more moral.

To P or not to P, that is the question...

I think that at least part of the reason most of us have a hard time stating what a frequentist P-value is (and is not) is because we do not know the difference between statisticians' correct definition and our common but erroneous definitions. In addition, or maybe another way of looking at it, is that the ambiguity of words in general (as opposed to math) contributes substantially to the confusion.

Here I take a stab at it.

A frequentist P-value (say, of a t-test for a difference between means) is the probability that a difference as large or larger than the observed difference would occur if our two samples were drawn from the same distribution, and the same experiment were conducted repeatedly ad infinitum (or ad nauseum). "A P value is often described as the probability of seeing results as or more extreme as those actually observed if the null hypothesis were true" (2).

A frequentist P-value of such a t-test is apparently not lots of things we wish it were (1):
  1. it is not the probability that our null hypothesis is true.
  2. it is not the probability that such a difference could occur by chance alone.
  3. it is not the probability of falsely rejecting the null hypothesis.
Each of these leave information out. Important elements of a correct definition include
  • The probability of observing data as extreme or more extreme in ...
  • repeated identical experiments and analyses, given ...
  • the a priori belief that the null hypothesis is true (this is a Bayesian "prior").
Thus we see that the incorrect definitions typically share something with a complete definition, and that under most circumstances, analyses that fit incorrect definitions (e.g., Bayesian posteriors) will be correlated frequentist P-values (see comment by Bolker comment below).

A frequentist confidence interval is a region calculated from our data and our selected confidence level, α, in a manner which would include the "true" population parameter (e.g., the "true" mean) 100(1-α)% of the time if the same experiment and analysis were conducted repeatedly ad infinitum. It is a region which, so calculated, would include the true population parameter in 95% of all hypothetically repeated identical experiments. Thus, the population parameter of interest is fixed (i.e., a "true" value exists), and the interval is random (because it is based on a randomized experiment).

A frequentist confidence interval is not an interval which we are 100(1-α)% certain contains the true parameter. I don't even know what this statement (i.e., 95% certain) means -- what is "certainty"?

A 95% Bayesian credible interval (a.k.a. Bayesian confidence interval) is the a continuous subset (i.e., an interval) of a posterior probability distribution of a parameter of interest (e.g., a mean). A posterior probability distribution is the probability distribution which results from combining of our prior beliefs about the parameter, and a conditional probability distribution of our data, given all possible relevant data sets. It contains all possible values of our parameter of interest (given our priors and our data, and our model). The credible interval is merely an interval which contains a most likely subset. That is, we are not sure what is was while the world turned and our data were collected, but the safest bet is inside the credible interval.

Key differences between frequentist and Bayesian statistics:
  • Parameter of interest (e.g., a mean): fixed (the Platonic archetype exists) vs. random (i.e., subject to the whims of the gods of stochasticity)
  • Prior beliefs: implicit and sometimes hard to discern vs. explicit and plainly stated.
  • Statements of probability: Pr(data|null) vs. Pr(h|data). That is, frequentist P-values are the probability of observing your data (or more extreme data) given that the null hypothesis is true, whereas Bayes posterior probability distributions describe the probability of your scientific hypothesis (not the null), given that your data are true.
  • "P-values": Exist vs. not typically presented (usually misinterpreted; Fisher used it as a "weight of evidence," whereas Neyman used it as a basis to make decisions (yes/no) but not necessarily true/false - I do not understand how or why they can do all that; in the Bayesian context, it is not always clear what such a beast would be because of differences in underlying interpretations).

My problem is that I do not have an intuition about how these things all differ or under what conditions they are likely to differ substantially. Therefore, I cannot keep them clearly differentiated in my head. All I can do is repeat them. It would help to explore the pathological cases where frequentist and Bayesian methods result in very different outcomes differ strongly.