Probability is the most basic subject among all the statistics courses. Since I decide to review what I have learned from these years, it is natural to start from reviewing Probability.
In this article I will only include something I think is important.
1. Counting Methods
1.1 Some definations and rules
Multiplication Rule
K experiment stages s.t. there are $n_k$ outcomes for each outcome of the first (k-1) stages. Then there are $n_1 \times n_2 \dots \times n_k$ possible outcomes of this k-stage experiment.
Permutation
The number of ways to arrange N numbers = The number of ways to order first N numbers
Combination
Choosing k out of N distinct objects
1.2 Application summary
Normally the permutation questions are not very difficult. So here I only summarise the combination problems. Basically there are four situations, the key factors to divide these situations are replacement and order.
Suppose the question is to choose k number from N distinct numbers, here are the number of distinct results:
Condition | Care for order | Not care for order |
---|---|---|
With replacement | $N^k$ | $C_{N+k-1}^{k}$ |
Without replacement | $P_{N}^{k}$ | $C_{N}^{k}$ |
The most confusing result is when we don't care for order and replacement is allowed. Here we could think of the k number as balls and N-1 as the number of movable sticks since N-1 sticks will divide all the number into N parts and the balls in each part belong to one number. In that case, we ignore the order and at the same time allow the k number to be the same.
2. Conditional Probability
2.1 Some definations and rules
Total Probability
$F_1, F_2 \dots, F_n$ are a set of disjoint events whose union is the entire sample space E, then we have:
$$P(E) = P(E|F_1)P(F_1)+\dots+P(E|F_n)P(F_n) $$
Normally, the event $E$ will be the output, and every sub event $F_i$ will be the reason that leads to that output. The intuition is that we sum all the probabilty that each reason leads to this output. (Reason infers Output)
Bayes Formula
Think of $E$ = outcome of an experiment, $F_1, F_2 \dots, F_n$ = hypothesis. We want to update our beliefs given the outcome of the experiment. That is, we want to compute $$P(F_k|E) = \frac{P(F_k)P(E|F_k)}{\sum_{i=1}^{n}P(F_i)P(E|F_i)}$$
If you carefully review bayes formula, you will find that what it calculates is just the opposite of the total probability. How to say that? Because for bayes formula, we already know the output and want to find the probability that each reason leads to this output. (Output infers reason/hypothesis)
2.2 Application
Example
Here is an example of the application of bayes formula
Suppose the probability that one person has this illness is 0.03. Due to the lag of the technology, the test results may make some mistakes. Here are the test information:
$$P(positive|ill) = 0.99$$$$P(negative|ill) = 0.01$$$$P(positive|not\ ill) = 0.05$$$$P(negative|not\ ill) = 0.95$$
Here I want to calculate the probability that the person is ill (hypothesis) given the test is positive (Output). $$P(ill|positive) = \frac{P(ill)P(positive|ill)}{P(ill)P(positive|ill)+P(not\ ill)P(positive|not\ ill)}=0.38$$
Further application
Based on these thoughts, bayes formula is widely used in machine learning field; The most famous one is the naive bayes classifier.
For example we want to decide people's gender based on $h$(height) and $w$(weight) two factors. What we need to do is to compare these two probabilities: $$P(male|h,w)\qquad P(female|h,w)$$ Based on bayes formula, $$P(gender|h,w)=\frac{P(gender)P(h,w|gender)}{P(male)P(h,w|male)+P(female)P(h,w|female)}$$ Since the denominator is $P(h,w)$ will be a constant (The reason is because this probability can be obtained from some statistical website), we could just ignore it. Then the problem goes to like this:$$gender = argmax_{gender}P(h,w|gender)P(gender)$$
We suppose $p(h|gender)$,$p(w|gender)$ is iid normal distribution. Then we have:$$gender = argmax_{gender}P(h|gender)P(w|gender)P(gender)$$ and based on their distribution we could calcualte the final results. This is how to use bayes formula in ML field.
3. Random Variables and Distributions
3.1 Summary Table
In this section,I ignore the basic defination of continuous and discrete variables.
Here is the summary table for the comparison between continuous and discrete variables.
Continuous | Discrete | |
---|---|---|
pdf/pmf | $f\ge0$, s.t. $P(x\in A)=\int_Af(x)dx$ | $1\ge f\ge 0, P(x\in A)=\sum_Af(x)$ |
cdf | $F(x)=P(X\le x)$ | $F(x)=P(X\le x)$ |
pdf/pmf -> cdf | $F(x)=\int_{-\infty}^{x}f(y)dy$ | $F(x)=\sum_{y\in x}f(x)$ |
cdf -> pdf/pmf | $f(x) = \frac{\partial F(x)}{\partial x}$ | $f(x)=F(x)-lim_{y->x^{-}}F(y)$ |
I will end my probability review here. Let us talk about inferenece in the next post :D