Conditional Probability
Discrete Form
A discrete random variable \(X\) is conditioned on an event \(A\) (\(P(A) > 0\)), then we may define the conditional PMF of \(X\) as follows:
\[p_{X | A} (x) = P( X = x | A)=\frac{P(\{X = x\} \cap A)}{P(A)}.\]
Apparently, \(p_{X | A} \ge 0\). And,
\[\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = \displaystyle\sum_{i = 1}^{N} \frac{P(\{X = x_i\} \cap A)}{P(A)}.\]
Since \(\{X = x_i\} (i = 1, 2, 3, \cdots, N)\) is a partition of \(\Omega\), so,
\[\displaystyle\sum_{i = 1}^N P(X = x_i \cap A) = P(A).\]
Thus,
\[\displaystyle\sum_{i = 1}^{N} p_{X | A}(x_i) = 1,\]
and \(p_{X|A}(x)\) is a legitimate PMF.
PMF conditioned on another discrete r.v.
Now we consider the special case \(A = \{Y = y\}\), here \(Y\) is another discrete random variable. We get
\[p_{X|Y}(x|y) = \frac{P(X = x, Y = y)}{P(Y = y)} = \frac{p_{X, Y}(x, y)}{p_Y(y)}.\]
It is easy to get
\[p_{X, Y}(x, y) = p_Y(y)p_{X|Y}(x|y) = p_X(x)p_{Y|X}(y|x).\]
When \(X\) and \(Y\) are independent, we have \(P(X = x, Y = y) = P(X = x) P(Y = y),\) and vice versa. Thus we conclude that \(X\) and \(Y\) are independent if and only if \(p_{X, Y}(x, y) = p_X(x) p_Y(y).\)
Now let us delve into \(p_{X|A}(x)\) by using total probability theorem. Given \(B_i (i = 1, 2, \cdots, N)\) as a partition of \(A\), then,
\[A = \displaystyle\bigcup_{i = 1}^{N} (A \cap B_i);\]
using Morgan’s theorem, we get
\[\{X = x\} \cap A = \displaystyle\bigcup_{i = 1}^{N} \{X = x\} \cap (A \cap B_i).\]
Finally, we get
\[ \begin{split} p_{X|A}(x) &= \frac{P(\{X = x\} \cap A)}{P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^{N} P(\{X = x\} \cap (A \cap B_i))} {P(A)} \\ &= \frac{\displaystyle\sum_{i = 1}^N P(\{X = x\} | (A \cap B_i)) P(A \cap B_i)}{P(A)} \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) P(\{X = x\}|A \cap B_i) \\ &= \displaystyle\sum_{i = 1}^N P(B_i|A) p_{X|(A \cap B_i)}(x). \end{split} \]
As a special case, we let \(A = \Omega\), in which \(B\) will be a partition of \(\Omega\), then
\[p_X(x) = \displaystyle\sum_{i = 1}^N P(B_i)p_{X|Bi}(x).\]
Continuous Form
A continuous random variable \(X\) is conditioned on an event \(A\) (\(P(A) > 0\)), then the possibility that \(X \in B\) is
\[P(\{X \in B\} | A) = \frac{P(\{X \in B\} \cap A)}{P(A)}.\]
We define conditional PDF of \(X\) as \(f_{X|A}(x)\).
Then we get
\[P(\{X \in B\} | A) = \frac{\int_B f_{X|A}(x)\,\mathrm{d}x}{P(A)}.\]
So, \(\forall B \sub \Omega\), the following equation holds:
\[\int_B f_{X|A}(x)\,\mathrm{d}x = P(\{X \in B\} \cap A).\]
Let \(B = (x, x + \delta)(\delta > 0)\),
\[\lim_{\delta \rightarrow 0^+} \int_x^{x + \delta}f_{X|A}(x)\,\mathrm{d}x = \lim_{\delta \rightarrow 0^+} f_{X|A}(x) \delta = \lim_{\delta \rightarrow 0^+} P(\{x \le X \le x + \delta\} | A);\]
Then we get
\[ \begin{split} f_{X|A}(x) &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}\\ &= \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} \cap A)}{\delta P(A)}.\\ \end{split} \]
Since \(P(A) \gt 0, \delta \gt 0\) and \({P(\{x \le X \le x + \delta\} \cap A)}\) is a valid probability, we know that \(f_{X|A}(x) \ge 0.\)
Let \(B = (-\infin, +\infin)\), we see that
\[\int_{-\infin}^{+\infin} f_{X|A}(x) \,\mathrm{d}x = \frac{P(\{X \in (-\infin, +\infin)\} \cap A)}{P(A)} = \frac{P(\Omega \cap A)}{P(A)} = 1,\] and thus \(f_{X|A}(x)\) is legitimate.
PDF conditioned on another continuous r.v.
Consider the case \(A = \{Y = y\}\), different from what we see in the corresponding PMF case, the tricky thing is that \(P(Y = y) = 0\) for continuous \(Y\). Here, instead of directly considering \(P(Y = y)\), we turn to \(P(y \le Y \le y + \epsilon) (\epsilon > 0).\)
\[ \begin{split} &\;P(\{x \le X \le x + \delta\}|\{y \le Y \le y + \epsilon\})\\ &= \frac{P(\{x \le X \le x + \delta\} \cap \{y \le Y \le y + \epsilon\})}{P(y\le Y \le y + \epsilon)}.\\ \end{split} \]
Now, we let \(\epsilon \rightarrow 0^+\):
\[ \begin{split} P(\{x \le X \le x + \delta\} | \{Y = y\}) &= \lim_{\epsilon \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \epsilon}{f_{Y}(y) \epsilon}\\ &= \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{f_Y(y)}. \end{split} \]
Note that we’ve alrea,y known that \(X\)’s PDF conditioned on \(A\):
\[f_{X|A}(x) = \lim_{\delta \rightarrow 0^+} \frac{P(\{x \le X \le x + \delta\} | A)}{\delta}.\]
We write \(X\)’s PDF conditioned on \(Y\) as
\[f_{X|Y}(x|y) = \lim_{\delta \rightarrow 0^+} \frac{\int_x^{x + \delta} f_{X, Y}(x, y)}{\delta f_Y(y)} = \frac{f_{X, Y}(x, y)}{f_Y(y)}.\]
When \(X\) and \(Y\) are independent, we get
\[P(\{x \le X \le x + \delta\} | \{Y = y\}) = P(\{x \le X \le x + \delta\}) = \frac{\int_x^{x + \delta} f_{X, Y}(x, y) \,\mathrm{d}x}{f_Y(y)}.\]
Then we devide both side by \(\delta\), and let \(\delta \rightarrow 0^+.\)
\[f_X(x) = \lim_{\delta \rightarrow 0^+} \frac{P(x \le X \le x + \delta)}{\delta} = \lim_{\delta \rightarrow 0^+} \frac{f_{X, Y}(x, y) \, \delta}{f_Y(y) \, \delta} = \frac{f_{X, Y}(x, y)}{f_{Y}(y)}.\]
That is, when \(X\) and \(Y\) are independent,
\[f_{X, Y}(x, y) = f_X(x) f_Y(y).\]
It is easy to verify that when we have \(f_{X, Y}(x, y) = f_X(x) f_Y(y)\), \(X\) and \(Y\) are independent.
\[\int_{x}^{x + \delta} \mathrm{d}x \int_{y}^{y + \epsilon} f_{X, Y}(x, y) \, \mathrm{d}y = \int_{x}^{x + \delta} f_X(x) \,\mathrm{d}x \int_{y}^{y + \epsilon} f_Y(y) \,\mathrm{d}y,\]
i.e., for any \(\delta, \epsilon \ge 0\)
\[P(\{X \in [x, x + \delta]\} \cap \{Y \in [y, y + \epsilon]\}) = P(X \in [x, x + \delta]) P(Y \in [y, y + \epsilon]).\]
Therefore, we can conclude that \(X\) and \(Y\) are independent if and only if \(f_{X, Y}(x, y) = f_X(x) f_Y(y).\)