• 欢迎光临~

Statistic Final Review

Covariance

Definition 4.2. The covariance of (X) and (Y) is

[operatorname{Cov}(X, Y)=mathbb{E}(X Y)-mathbb{E} X cdot mathbb{E} Y ]

If (X) and (Y) are independent then (operatorname{Cov}(X, Y)=0). But the converse is not true: we can have (X, Y) with (operatorname{Cov}(X, Y)=0) and (X) and (Y) not independent.
Covariance can be positive, negative, or (0 .)
Lemma 4.3.

[operatorname{Var}(X+Y)=operatorname{Var}(X)+operatorname{Var}(Y)+2 operatorname{Cov}(X, Y) . ]

The formula generalizes:
Lemma 4.4.

[operatorname{Var}left(X_{1}+cdots+X_{n}right)=sum_{i=1}^{n} operatorname{Var}left(X_{i}right)+2 sum_{i<j} operatorname{Cov}left(X_{i}, X_{j}right) ]

(Remember that in this formula (i<j) so we add (2 operatorname{Cov}left(X_{1}, X_{2}right)), for example, not 1 times or 4 times.)

Properties of Covariance

If (X, Y, W), and (V) are real-valued random variables and (a, b, c, d) are real-valued constants, then the following facts are a consequence of the definition of covariance:

[begin{aligned} operatorname{cov}(X, a) &=0 \ operatorname{cov}(X, X) &=operatorname{var}(X) \ operatorname{cov}(X, Y) &=operatorname{cov}(Y, X) \ operatorname{cov}(a X, b Y) &=a b operatorname{cov}(X, Y) \ operatorname{cov}(X+a, Y+b) &=operatorname{cov}(X, Y) \ operatorname{cov}(a X+b Y, c W+d V) &=a c operatorname{cov}(X, W)+a d operatorname{cov}(X, V)+b c operatorname{cov}(Y, W)+b d operatorname{cov}(Y, V) end{aligned} ]

For a sequence (X_{1}, ldots, X_{n}) of random variables in real-valued, and constants (a_{1}, ldots, a_{n}), we have

[operatorname{var}left(sum_{i=1}^{n} a_{i} X_{i}right)=sum_{i=1}^{n} a_{i}^{2} sigma^{2}left(X_{i}right)+2 sum_{i, j: i<j} a_{i} a_{j} operatorname{cov}left(X_{i}, X_{j}right)=sum_{i, j} a_{i} a_{j} operatorname{cov}left(X_{i}, X_{j}right) ]

Hoeffding's covariance identity

A useful identity to compute the covariance between two random variables (X, Y) is the Hoeffding's covariance identity: [7]

[operatorname{cov}(X, Y)=int_{mathbb{R}} int_{mathbb{R}}left(F_{(X, Y)}(x, y)-F_{X}(x) F_{Y}(y)right) d x d y ]

where (F_{(X, Y)}(x, y)) is the joint cumulative distribution function of the random vector ((X, Y)) and (F_{X}(x), F_{Y}(y)) are the marginals.

Covariance Matrix

Definition 4.5. The covariance matrix (Sigma) of a random vector (vec{X}) is an (m times m) matrix where

[Sigma_{i j}=operatorname{Cov}left(X_{i}, X_{j}right) . ]

This means the diagonal entries (Sigma_{i i}, i=1, ldots, m), are the variances of the components. (Sigma_{i i}=operatorname{Var}left(X_{i}right) .)
We can also write the covariance matrix in terms of vectors:

[Sigma=mathbb{E}left[(vec{X}-vec{mu})(vec{X}-vec{mu})^{T}right] ]

This is the expectation of the outer product of a vector with itself. (Here we assume (vec{X}) and (vec{mu}) are column vectors).

Properties of covariance matrices

Let (mathbb{E} vec{X}=vec{mu}) and (Sigma=mathbb{E}left[(vec{X}-vec{mu})(vec{X}-vec{mu})^{T}right]). Then

1. (Sigma=mathbb{E}left[X X^{T}right]-vec{mu} vec{mu}^{T}).
2. (Sigma) is a symmetric matrix. That is, (Sigma_{i j}=Sigma_{j i}).
3. (Sigma) is a positive semi-definite matrix. That is, for every vector (vec{y} in mathbb{R}^{m}),

[vec{y}^{T} Sigma vec{y} geq 0 ]

Sums of independent random variables

Suppose (X) and (Y) are independent, continuous random variables, and let (Z=X+Y). What is the distribution of (Z) ?
We can calculate

[begin{aligned} F_{Z}(t) &=P(Z leq t) \ &=int_{-infty}^{infty} int_{-infty}^{t-x} f_{X, Y}(x, y) d y d x \ &=int_{-infty}^{infty} int_{-infty}^{t-x} f_{X}(x) f_{Y}(y) d y d x end{aligned} ]

or

[f_{Z}(t)=int_{-infty}^{infty} f_{X}(x) f_{Y}(t-x) d x ]

We can do the same for discrete random variables. Suppose (X) and (Y) are independent, discrete random variables, and let (Z=X+Y). What is the distribution of (Z) ?

[F_{Z}(t)=sum_{x} sum_{y: x+y leq t} f_{X}(x) f_{Y}(y) . ]

or

[f_{Z}(t)=sum_{k} f_{X}(k) f_{Y}(t-k) . ]

The first and second moment methods

Markov's Inequality & first moment method

Theorem (5.1) (Markov's inequality). Suppose (X) is a non-negative random variable. Then

[P[X geq t] leq frac{mathbb{E} X}{t} ]

Say we have a counting (or any non-negative) random variable (X) and its expectation is small. Then Markov's inequality tells us that (X) cannot be too big too often.

The First moment method

Lemma (5.2) (The first moment method for counting random variables). Let (B_{1}(n), B_{2}(n)), (ldots, B_{m}(n)) be a sequence of ((b a d)) events that may depend on some hidden parameter (n). If

[lim _{n rightarrow infty} sum_{i=1}^{m} Pleft(B_{i}(n)right)=0 ]

then (lim _{n rightarrow infty} P[) At least one bad event occurs (]=0).
This is the first moment method.
Example: Let (M_{n}) be an (n times n) matrix in which each entry is 0 or 1 with probability (1 / 2) each, independently of all other entries. Let (B) be the event that there is a row of all 0 's. Show that (P(B) rightarrow 0) as (n rightarrow infty)

Chebyshev's Inequality & second moment method

Theorem (5.3) (Chebyshev's Inequality). Let (X) be any random variable with finite mean and variance. Then

[P[|X-mathbb{E} X| geq t] leq frac{operatorname{Var}(X)}{t^{2}} ]

In statistics we will be most interested in using Chebyshev's Inequality with a sequence of random variables and asking about the behavior in the limit.

The second moment method

Corollary 5.4. Suppose (X) is a non-negative, integer-valued random variable. Then

[P(X geq 1) geq 1-frac{operatorname{Var}(X)}{(mathbb{E} X)^{2}} ]

Or in other words, let (B_{1}(n), B_{2}(n), ldots, B_{m(n)}(n)) be a sequence of (bad) events that may depend on some hidden parameter (n). Let (X=sum_{i=1}^{m(n)} 1_{B_{i}}) be the number of these events that occur. If

[lim _{n rightarrow infty} frac{operatorname{Var}(X)}{(mathbb{E} X)^{2}}=0 ]

then (lim _{n rightarrow infty} P[) At least one bad event occurs (]=1).

Convergence of RV

3 types of convergence:

• Convergence in probability
• Convergence in distribution
• Almost sure convergence (^*) (non-examinbable)

[text{almost surly} Rightarrow text{Converge in probability } Rightarrow text{Converge in distribution} ]

Convergence in Probability

Definition 6.2. Let (X_{1}, X_{2}, ldots) be an infinite sequence of random variables, and let (X) be a single random variable, all defined on the same probability space.
Then we say (X_{n}) converges to (X) in probability if for every (epsilon>0),

[lim _{n rightarrow infty} Pleft[left|X_{n}-Xright|>epsilonright]=0 . ]

Notice that this is a statement about the limit of probabilities.

Convergence in distribution

Example 6.3. Let (X_{1}, ldots) be a sequence of random variables with (X_{n} sim operatorname{Bin}(n, lambda / n)). Then (lim _{n rightarrow infty} Pleft[X_{n}=kright]=P[) Pois ((lambda)=k])

Definition 6.4. We say a sequence of random variables (X_{1}, X_{2}, ldots) converges in distribution to a random variable (X) if

[lim _{n rightarrow infty} F_{X_{n}}(t)=F_{X}(t) ]

for every (t in mathbb{R}) at which (F_{X}(t)) is continuous (e.g.convergence holds for all continuity points of (F_{X}(t)). We write this with a double arrow:)

[X_{n} Rightarrow X ]

Lemma 6.6. Let (X_{1}, X_{2}, ldots) be a sequence of discrete random variables and (X) another discrete random variable, they are all non-negative integer valued. Then,

[X_{n} Rightarrow X ]

if and only if

[lim _{n rightarrow infty} f_{X_{n}}(t)=f_{X}(t) ]

for all (t in mathbb{R}).

The Law of Large Numbers

Weak Law of Large Numbers (WLLN)

Theorem 6.12 (Weak Law of Large Numbers). Let (X_{1}, X_{2}, ldots) be a sequence of (i . i . d). random variables with (mathbb{E} X_{j}=mu) and (operatorname{Var}left(X_{j}right)=sigma^{2}<infty). Let (bar{S}_{n}=frac{X_{1}+cdots+X_{n}}{n}). Then for every (epsilon>0),

[lim _{n rightarrow infty} Pleft[left|bar{S}_{n}-muright|>epsilonright]=0 ]

Strong Law of Large Numbers is non-examinable.

Characteristic functions

Definition 6.14. The characteristic function (phi_{X}(t)) of a random variable (X) is the function

[phi_{X}(t)=mathbb{E}left[e^{i t X}right] ]

(This is related to the Fourier transform of the distribution of (X) ).

Theorem (6.15) (Levy's Continuity Theorem). Let (X_{1}, X_{2}, ldots) be a sequence of random variables and (X) another random variable. Then the following are equivalent:

1. (X_{n} Rightarrow X) (convergence in distribution).
2. (lim _{n rightarrow infty} phi_{X_{n}}(t)=phi_{X}(t)) for all (t).

Limit theorems via characteristic functions

A proof of the WLLN

Here is the Weak Law of Large Numbers, assuming only the existence of a finite mean.
Theorem 6.16. Let (X_{1}, X_{2}, ldots) be i.i.d. random variables with (mathbb{E}_{2} X_{k}=mu .) Let (bar{S}_{n}=frac{X_{1}+cdots X_{n}}{n}). Then (bar{S}_{n} Rightarrow mu)
(Notice the conditions are less restrictive than those when we proved the WLLN using Chebyshev's inequality).

The Central Limit Theorem

Theorem 6.17. Let (X_{1}, X_{2}, ldots) be i.i.d. random variables with (mathbb{E} X_{i}=mu) and (operatorname{Var}left(X_{i}right)=sigma^{2}). Let

[tilde{S}_{n}=frac{X_{1}+cdots+X_{n}-n mu}{sigma sqrt{n}} ]

Then (tilde{S}_{n} Rightarrow N(0,1)).

Moment generating function

Definition 6.18. The moment generating function of a random variable (X) is the function

[M_{X}(t)=mathbb{E}left[e^{t X}right] ]

Beware: the function may not be defined for all values of (t), i.e. the expectation might be (infty .)

Probability generating function

Definition 6.19. The probability generating function of a counting random variable (X) is the function

[G_{X}(t)=sum_{k=0}^{infty} t^{k} P[X=k]=mathbb{E}left[t^{X}right] ]

Beware: the function may not be defined for all values of (t), ie. the sum might be (infty).

Random walks

The simple symmetric random walk

Definition 7.2. The simple symmetric random walk is the discrete-time, discrete-space stochastic process (S_{0}, S_{1}, S_{2}, ldots) with (S_{0}=0) and (S_{n}=X_{1}+cdots+X_{n}) where the (X_{j}) 's are i.i.d. (pm 1) with probability (1 / 2) each.
'Simple' refers to the fact that the increments are just (+1) and (-1). 'Symmetric' refers to the fact that the probability of (+1) is (1 / 2).

We can define other random walks. Given any distribution (X), let (X_{1}, X_{2}, ldots) be i.i.d. copies of (X). Then let (S_{0}=0) and (S_{n}=X_{1}+cdots+X_{n}) is a random walk.
Exercise: prove that any random walk is a Markov chain.
Basic properties of the Simple Symmetric Random Walk (SSRW)

• (mathbb{E} S_{n}=0)
• (operatorname{Var}left(S_{n}right)=n)
• (S_{n} / n rightarrow 0) in probability and almost surely. (Law of Large Numbers)
• (S_{n} / sqrt{n} Rightarrow N(0,1)). (Central Limit Theorem)