Conditional Probability

Discrete Form

A discrete random variable $X$ is conditioned on an event $A$ ( $P(A) > 0$ ), then we may define the conditional PMF of $X$ as follows:

$$p_{X | A} (x) = P( X = x | A)=\frac{P(\{X = x\} \cap A)}{P(A)}.$$

Apparently, $p_{X | A} \ge 0$ . And,

$$\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = \displaystyle\sum_{i = 1}^{N} \frac{P(\{X = x_i\} \cap A)}{P(A)}.$$

Since $\{X = x_i\} (i = 1, 2, 3, \cdots, N)$ is a partition of $\Omega$ , so,

$$\displaystyle\sum_{i = 1}^N P(X = x_i \cap A) = P(A).$$

Thus,

$$\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = 1,$$

and $p_{X|A}(x)$ is a legitimate PMF.

PMF conditioned on another discrete r.v.

Now we consider the special case $A = \{Y = y\}$ , here $Y$ is another discrete random variable. We get

$$p_{X|Y}(x|y) = \frac{P(X = x, Y = y)}{P(Y = y)} = \frac{p_{X, Y}(x, y)}{p_Y(y)}.$$

It is easy to get

$$p_{X, Y}(x, y) = p_Y(y)p_{X|Y}(x|y) = p_X(x)p_{Y|X}(y|x).$$

When $X$ and $Y$ are independent, we have $P(X = x, Y = y) = P(X = x) P(Y = y),$ and vice versa. Thus we conclude that $X$ and $Y$ are independent if and only if $p_{X, Y}(x, y) = p_X(x) p_Y(y).$

Now let us delve into $p_{X|A}(x)$ by using total probability theorem. Given $B_i (i = 1, 2, \cdots, N)$ as a partition of $A$ , then,

$$A = \displaystyle\bigcup_{i = 1}^{N} (A \cap B_i);$$

using Morgan’s theorem, we get

$$\{X = x\} \cap A = \displaystyle\bigcup_{i = 1}^{N} \{X = x\} \cap (A \cap B_i).$$

<!-- $ $P(A) = \displaystyle\sum_{i = 1}^{N} P(A \cap B_i).$ $ -->

Finally, we get

$$\begin{split} p_{X|A}(x) &= \frac{P(\{X = x\} \cap A)}{P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^{N} P(\{X = x\} \cap (A \cap B_i))} {P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^N P(\{X = x\} | (A \cap B_i)) P(A \cap B_i)}{P(A)} \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) P(\{X = x\}|A \cap B_i) \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) p_{X|(A \cap B_i)}(x). \end{split}$$

As a special case, we let $A = \Omega$ , in which $B$ will be a partition of $\Omega$ , then

$$p_X(x) = \displaystyle\sum_{i = 1}^N P(B_i)p_{X|Bi}(x).$$

Continuous Form

A continuous random variable $X$ is conditioned on an event $A$ ( $P(A) > 0$ ), then the possibility that $X \in B$ is

$$P(\{X \in B\} | A) = \frac{P(\{X \in B\} \cap A)}{P(A)}.$$

We define conditional PDF of $X$ as $f_{X|A}(x)$ .

Then we get

$$P(\{X \in B\} | A) = \frac{\int_B f_{X|A}(x)\,\mathrm{d}x}{P(A)}.$$

So, $\forall B \sub \Omega$ , the following equation holds:

$$\int_B f_{X|A}(x)\,\mathrm{d}x = P(\{X \in B\} \cap A).$$

Let $B = (x, x + \delta)(\delta > 0)$ ,

$$\lim_{\delta \rightarrow 0^+} \int_x^{x + \delta}f_{X|A}(x)\,\mathrm{d}x = \lim_{\delta \rightarrow 0^+} f_{X|A}(x) \delta = \lim_{\delta \rightarrow 0^+} P(\{x \le X \le x + \delta\} | A);$$

Then we get

$$\begin{split} f_{X|A}(x) &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}\\ &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} \cap A)}{\delta P(A)}.\\ \end{split}$$

Since $P(A) \gt 0, \delta \gt 0$ and ${P(\{x \le X \le x + \delta\} \cap A)}$ is a valid probability, we know that $f_{X|A}(x) \ge 0.$

Let $B = (-\infin, +\infin)$ , we see that

$$\int_{-\infin}^{+\infin} f_{X|A}(x) \,\mathrm{d}x = \frac{P(\{X \in (-\infin, +\infin)\} \cap A)}{P(A)} = \frac{P(\Omega \cap A)}{P(A)} = 1,$$ and thus $f_{X|A}(x)$ is legitimate. ### PDF conditioned on another continuous *r.v.* Consider the case $A = \{Y = y\}$, different from what we see in the corresponding PMF case, the tricky thing is that $P(Y = y) = 0$ for continuous $Y$. Here, instead of directly considering $P(Y = y)$, we turn to $P(y \le Y \le y + \epsilon) (\epsilon > 0).$ $$

\begin{split} &;P({x \le X \le x + \delta}|{y \le Y \le y + \epsilon})\ &= \frac{P({x \le X \le x + \delta} \cap {y \le Y \le y + \epsilon})}{P(y\le Y \le y + \epsilon)}.\ \end

$$ Now, we let $\epsilon \rightarrow 0^+$: $$

\begin{split} P({x \le X \le x + \delta} | {Y = y}) &= \lim_{\epsilon \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \epsilon}{f_{Y}(y) \epsilon}\ &= \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{f_Y(y)}. \end

$$ Note that we've alrea\,\mathrm{d}y known that $X$'s PDF conditioned on $A$: $$f_{X|A}(x) = \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}.$$

We write $X$ ’s PDF conditioned on $Y$ as

$$f_{X|Y}(x|y) = \lim_{\delta \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{\delta f_Y(y)} = \frac{f_{X, Y}(x, y)}{f_Y(y)}.$$

When $X$ and $Y$ are independent, we get

$$P(\{x \le X \le x + \delta\} | \{Y = y\}) = P(\{x \le X \le x + \delta\}) = \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \,\mathrm{d}x}{f_Y(y)}.$$

Then we devide both side by $\delta$ , and let $\delta \rightarrow 0^+.$

$$f_X(x) = \lim_{\delta \rightarrow 0^+} \frac{P(x \le X \le x + \delta)}{\delta} = \lim_{\delta \rightarrow 0^+} \frac{f_{X, Y}(x, y) \, \delta}{f_Y(y) \, \delta} = \frac{f_{X, Y}(x, y)}{f_{Y}(y)}.$$

That is, when $X$ and $Y$ are independent,

$$f_{X, Y}(x, y) = f_X(x) f_Y(y).$$

It is easy to verify that when we have $f_{X, Y}(x, y) = f_X(x) f_Y(y)$ , $X$ and $Y$ are independent.

$$\int_{x}^{x + \delta} \mathrm{d}x \int_{y}^{y + \epsilon} f_{X, Y}(x, y) \, \mathrm{d}y = \int_{x}^{x + \delta} f_X(x) \,\mathrm{d}x \int_{y}^{y + \epsilon} f_Y(y) \,\mathrm{d}y,$$

i.e., for any $\delta, \epsilon \ge 0$

$$P(\{X \in [x, x + \delta]\} \cap \{Y \in [y, y + \epsilon]\}) = P(X \in [x, x + \delta]) P(Y \in [y, y + \epsilon]).$$

Therefore, we can conclude that $X$ and $Y$ are independent if and only if $f_{X, Y}(x, y) = f_X(x) f_Y(y).$