### Posterior distribution

This article relies largely or entirely upon a single source. (August 2011) |

**
**

Template:Bayesian statistics

In Bayesian statistics, the **posterior probability** of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account. Similarly, the **posterior probability distribution** is the distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.

## Definition

The posterior probability is the probability of the parameters $\backslash theta$ given the evidence $X$: $p(\backslash theta|X)$.

It contrasts with the likelihood function, which is the probability of the evidence given the parameters: $p(X|\backslash theta)$.

The two are related as follows:

Let us have a prior belief that the probability distribution function is $p(\backslash theta)$ and observations $X$ with the likelihood $p(X|\backslash theta)$, then the posterior probability is defined as

- $p(\backslash theta|X)\; =\; \backslash frac\{p(\backslash theta)p(X|\backslash theta)\}\{p(X)\}.$
^{[1]}

The posterior probability can be written in the memorable form as

- $\backslash text\{Posterior\; probability\}\; \backslash propto\; \backslash text\{Prior\; probability\}\; \backslash times\; \backslash text\{Likelihood\}$.

## Example

Suppose there is a mixed school having 60% boys and 40% girls as students. The girls wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.

The event *G* is that the student observed is a girl, and the event *T* is that the student observed is wearing trousers. To compute P(*G*|*T*), we first need to know:

- P(
*G*), or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the percentage of girls among the students is 40%, this probability equals 0.4. - P(
*B*), or the probability that the student is not a girl (i.e., a boy) regardless of any other information (*B*is the complementary event to*G*). This is 60%, or 0.6. - P(
*T*|*G*), or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5. - P(
*T*|*B*), or the probability of the student wearing trousers given that the student is a boy. This is given as 1. - P(
*T*), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since P(*T*) = P(*T*|*G*)P(*G*) + P(*T*|*B*)P(*B*) (via the law of total probability), this is 0.5×0.4 + 1×0.6 = 0.8.

Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:

- $P(G|T)\; =\; \backslash frac\{P(T|G)\; P(G)\}\{P(T)\}\; =\; \backslash frac\{0.5\; \backslash times\; 0.4\}\{0.8\}\; =\; 0.25.$

## Calculation

The posterior probability distribution of one random variable given the value of another can be calculated with Bayes' theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows:

- $f\_\{X\backslash mid\; Y=y\}(x)=\{f\_X(x)\; L\_\{X\backslash mid\; Y=y\}(x)\; \backslash over\; \{\backslash int\_\{-\backslash infty\}^\backslash infty\; f\_X(x)\; L\_\{X\backslash mid\; Y=y\}(x)\backslash ,dx\}\}$

gives the posterior probability density function for a random variable *X* given the data *Y* = *y*, where

- $f\_X(x)$ is the prior density of
*X*,

- $L\_\{X\backslash mid\; Y=y\}(x)\; =\; f\_\{Y\backslash mid\; X=x\}(y)$ is the likelihood function as a function of
*x*,

- $\backslash int\_\{-\backslash infty\}^\backslash infty\; f\_X(x)\; L\_\{X\backslash mid\; Y=y\}(x)\backslash ,dx$ is the normalizing constant, and

- $f\_\{X\backslash mid\; Y=y\}(x)$ is the posterior density of
*X*given the data*Y*=*y*.

## Classification

In classification posterior probabilities reflect the uncertainty of assessing an observation to particular class, see also Class membership probabilities. While Statistical classification methods by definition generate posterior probabilities, Machine Learners usually supply membership values which do not induce any probabilistic confidence. It is desirable to transform or re-scale membership values to class membership probabilities, since they are comparable and additionally easier applicable for post-processing.

## See also

- Prediction interval
- Bernstein–von Mises theorem
- Monty Hall Problem
- Three Prisoners Problem
- Bertrand's box paradox