Processing math: 100%

Conditional probability

Function

context f:X×YR … measurable in X
definition pX:X×YR
definition pX:NXf(x,y)
This is really just the conditional probability when coming from a joint “probability kernel”, i.e a function f(x,y).
The operator NX maps to a function that is normalized w.r.t. X.

Discussion

Notation

pX(x,y)=NXf(x,y) is usually written pX(x|y).

Morevoer, NYf(x,y) is usually written pX(y|x).

By definition, those functions are normalized in the variable on the left of |. Note that, in the second case (pX(y|x)), the positions of the arguments are getting switched up. But of course, as parameters of the expression, still xX and yY is intented.

Examples

The normalization is often given by an integral dx.

Note

With

pX(x,y)=f(x,y)f(x,y)dx

we have that

pX(x,y0)dx=1

for all values y0Y.

Theorem

Consider a pair of functions SL,SR:(AR)AR, then

fSLf=fSRfSRfSLf=fSRf1SLSRfSRf1SLSRfSLf

If the pair of functions commute, we can write

fSLf=fSRfSRfSL(SRf)SLfSR(SLf)

Bayes rule: For S the sum/integral over left and right component, we have

SLfSLf=SLfSLf=1

and the above relation reads

NXf(x,y)=NYf(x,y)NXf(x,y)dyp(y) with p(y)=NYf(x,y)dx.

This is Bayes rule for a continuous case and assuming a joint probability kernel f. Here the numerator is the conditional probability for y (also called likelyhood) time the “prior” probability for x.

Using the relation even if you have to make up with the functions in the denominator yourself: Bayes theorem is derived in such a way, but then stated more generally, without reference to the origin of the functions involved. Indeed, in statistical practice, providing the functions in the numerator amounts to specifying the way to improve your best guess about which parameters of some mathematical model of interest should be chosen. As the left hand side must be a probability in x, the denominator on the right hand side is just the normalization.

Having chosen any function to be the conditional probability pmodel(y|x) (as well as one prior, which is less relevant), and given a sequence of inputs yi (observations), Bayes algorithm is moving from one conditional probability in x (the normalized function p(x|{y1,,yn}) on the left of the theorem) to the next ({y1,,yn,yn+1}) by successive multiplication with pmodel(ya|x) and re-normalization (so to actually get a probability in x out of it). This is indeed just the repeated application of Bayes theorem.

Philosophically speaking, this optimal updating of believe (when assuming a rigid model pmodel(ya|x)) is a major reason for the mathematical operation of multiplication as such. PS Bayesian inference has the Cox axioms as justification for as a relevant logic of believe.

Reference

Wikipedia: Conditional probability


Context

Probability space

Drastic measures, Bayes algorithm