概率论
Author

admin

Conditional Probability

Discrete Form

A discrete random variable X is conditioned on an event A (P(A) > 0), then we may define the conditional PMF of X as follows:

p_{X | A} (x) = P( X = x | A)=\frac{P(\{X = x\} \cap A)}{P(A)}.

Apparently, p_{X | A} \ge 0. And,

\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = \displaystyle\sum_{i = 1}^{N} \frac{P(\{X = x_i\} \cap A)}{P(A)}.

Since \{X = x_i\} (i = 1, 2, 3, \cdots, N) is a partition of \Omega, so,

\displaystyle\sum_{i = 1}^N P(X = x_i \cap A) = P(A).

Thus,

\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = 1,

and p_{X|A}(x) is a legitimate PMF.

PMF conditioned on another discrete r.v.

Now we consider the special case A = \{Y = y\}, here Y is another discrete random variable. We get

p_{X|Y}(x|y) = \frac{P(X = x, Y = y)}{P(Y = y)} = \frac{p_{X, Y}(x, y)}{p_Y(y)}.

It is easy to get

p_{X, Y}(x, y) = p_Y(y)p_{X|Y}(x|y) = p_X(x)p_{Y|X}(y|x).

When X and Y are independent, we have P(X = x, Y = y) = P(X = x) P(Y = y), and vice versa. Thus we conclude that X and Y are independent if and only if p_{X, Y}(x, y) = p_X(x) p_Y(y).

Now let us delve into p_{X|A}(x) by using total probability theorem. Given B_i (i = 1, 2, \cdots, N) as a partition of A, then,

A = \displaystyle\bigcup_{i = 1}^{N} (A \cap B_i);

using Morgan’s theorem, we get

\{X = x\} \cap A = \displaystyle\bigcup_{i = 1}^{N} \{X = x\} \cap (A \cap B_i).

Finally, we get

\begin{split} p_{X|A}(x) &= \frac{P(\{X = x\} \cap A)}{P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^{N} P(\{X = x\} \cap (A \cap B_i))} {P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^N P(\{X = x\} | (A \cap B_i)) P(A \cap B_i)}{P(A)} \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) P(\{X = x\}|A \cap B_i) \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) p_{X|(A \cap B_i)}(x). \end{split}

As a special case, we let A = \Omega, in which B will be a partition of \Omega, then

p_X(x) = \displaystyle\sum_{i = 1}^N P(B_i)p_{X|Bi}(x).

Continuous Form

A continuous random variable X is conditioned on an event A (P(A) > 0), then the possibility that X \in B is

P(\{X \in B\} | A) = \frac{P(\{X \in B\} \cap A)}{P(A)}.

We define conditional PDF of X as f_{X|A}(x).

Then we get

P(\{X \in B\} | A) = \frac{\int_B f_{X|A}(x)\,\mathrm{d}x}{P(A)}.

So, \forall B \sub \Omega, the following equation holds:

\int_B f_{X|A}(x)\,\mathrm{d}x = P(\{X \in B\} \cap A).

Let B = (x, x + \delta)(\delta > 0),

\lim_{\delta \rightarrow 0^+} \int_x^{x + \delta}f_{X|A}(x)\,\mathrm{d}x = \lim_{\delta \rightarrow 0^+} f_{X|A}(x) \delta = \lim_{\delta \rightarrow 0^+} P(\{x \le X \le x + \delta\} | A);

Then we get

\begin{split} f_{X|A}(x) &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}\\ &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} \cap A)}{\delta P(A)}.\\ \end{split}

Since P(A) \gt 0, \delta \gt 0 and {P(\{x \le X \le x + \delta\} \cap A)} is a valid probability, we know that f_{X|A}(x) \ge 0.

Let B = (-\infin, +\infin), we see that

\int_{-\infin}^{+\infin} f_{X|A}(x) \,\mathrm{d}x = \frac{P(\{X \in (-\infin, +\infin)\} \cap A)}{P(A)} = \frac{P(\Omega \cap A)}{P(A)} = 1, and thus f_{X|A}(x) is legitimate.

PDF conditioned on another continuous r.v.

Consider the case A = \{Y = y\}, different from what we see in the corresponding PMF case, the tricky thing is that P(Y = y) = 0 for continuous Y. Here, instead of directly considering P(Y = y), we turn to P(y \le Y \le y + \epsilon) (\epsilon > 0).

\begin{split} &\;P(\{x \le X \le x + \delta\}|\{y \le Y \le y + \epsilon\})\\ &= \frac{P(\{x \le X \le x + \delta\} \cap \{y \le Y \le y + \epsilon\})}{P(y\le Y \le y + \epsilon)}.\\ \end{split}

Now, we let \epsilon \rightarrow 0^+:

\begin{split} P(\{x \le X \le x + \delta\} | \{Y = y\}) &= \lim_{\epsilon \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \epsilon}{f_{Y}(y) \epsilon}\\ &= \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{f_Y(y)}. \end{split}

Note that we’ve alrea,y known that X’s PDF conditioned on A:

f_{X|A}(x) = \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}.

We write X’s PDF conditioned on Y as

f_{X|Y}(x|y) = \lim_{\delta \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{\delta f_Y(y)} = \frac{f_{X, Y}(x, y)}{f_Y(y)}.

When X and Y are independent, we get

P(\{x \le X \le x + \delta\} | \{Y = y\}) = P(\{x \le X \le x + \delta\}) = \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \,\mathrm{d}x}{f_Y(y)}.

Then we devide both side by \delta, and let \delta \rightarrow 0^+.

f_X(x) = \lim_{\delta \rightarrow 0^+} \frac{P(x \le X \le x + \delta)}{\delta} = \lim_{\delta \rightarrow 0^+} \frac{f_{X, Y}(x, y) \, \delta}{f_Y(y) \, \delta} = \frac{f_{X, Y}(x, y)}{f_{Y}(y)}.

That is, when X and Y are independent,

f_{X, Y}(x, y) = f_X(x) f_Y(y).

It is easy to verify that when we have f_{X, Y}(x, y) = f_X(x) f_Y(y), X and Y are independent.

\int_{x}^{x + \delta} \mathrm{d}x \int_{y}^{y + \epsilon} f_{X, Y}(x, y) \, \mathrm{d}y = \int_{x}^{x + \delta} f_X(x) \,\mathrm{d}x \int_{y}^{y + \epsilon} f_Y(y) \,\mathrm{d}y,

i.e., for any \delta, \epsilon \ge 0

P(\{X \in [x, x + \delta]\} \cap \{Y \in [y, y + \epsilon]\}) = P(X \in [x, x + \delta]) P(Y \in [y, y + \epsilon]).

Therefore, we can conclude that X and Y are independent if and only if f_{X, Y}(x, y) = f_X(x) f_Y(y).