Law of large numbers. Central limit theorem

The law of large numbers discussed above establishes the fact that the average of a large number of random variables approaches certain constants. But this does not limit the patterns that arise as a result of the total action of random variables. It turns out that under some very general conditions the combined action of a large number of random variables leads to a certain y, namely the normal y distribution law.

Central limit theorem is a group of theorems devoted to establishing the conditions under which a normal distribution law arises. Among these theorems, the most important place belongs to Lyapunov’s theorem.

Lyapunov's theorem. If X ( , X ъ ..., , each of which has a mathematical expectation M(X g) = A,

dispersion 0(Хд=a 2, absolute central moment of third order And

then the law of distribution of the amount when n -> oo unlimited

but approaches normal with mathematical expectation and variance

We accept the theorem without proof.

Unlimited approximation of the sum distribution law

to the normal law for n -> oo in accordance with the properties of the normal law means that

where Ф(r) is the Laplace function (2.11).

The meaning of condition (6.20) is that the sum should not be

terms whose influence on scattering U p overwhelmingly large compared to the influence of all the others, and there should not be a large number of random terms, the influence of which is very small compared to the total influence of the rest. Thus, the specific weight of each individual term should tend to zero as the number of terms increases.

So, for example, electricity consumption for household needs per month in each apartment of an apartment building can be represented as n various random variables. If the electricity consumption in each apartment does not stand out sharply from the rest in terms of its value, then based on Lyapunov’s theorem we can assume that the electricity consumption of the entire house, i.e. sum n independent random variables will be a random variable that has an approximately normal distribution law. If, for example, a computer center is located in one of the premises of the house, the level of electricity consumption is incomparably higher than in each apartment for domestic needs, then the conclusion about the approximately normal distribution of electricity consumption of the entire house will be incorrect, since condition (6.20) is violated. because the electricity consumption of the computer center will play a predominant role in the formation of the entire amount of consumption.

Another example. With stable and well-functioning operation of the machines, uniformity of the material being processed, etc. variation in product quality takes the form of a normal distribution law due to the fact that production error is the result of the total action of a large number of random variables: error of a machine, tool, worker, etc.

Consequence. If X ( , X 2 , ..., X n - independent random variables, which have equal mathematical expectations M(X () = A, dispersion 0(X,) = a 2 and absolute central moments of the third

order then the law of distribution of the amount

at n -> with indefinitely approaches normal

law.

The proof boils down to checking condition (6.20):

therefore, equality (6.21) also holds. ?

In particular, if all random variables X) are equally distributed, then the law of distribution of their sum indefinitely approaches the normal law as n -> oo.

Let us illustrate this statement with the example of summing independent random variables that have a uniform distribution on the interval (0, 1). The distribution curve of one such random variable is shown in Fig. 6.2, A. In Fig. 6.2, b shows the probability density of the sum of two such random variables (see example 5.9), and in Fig. 6.2, V - the probability density of the sum of three such random variables (its graph consists of three segments of parabolas on the intervals (0; 1), (1; 2) and (2; 3) and, however, already resembles a normal curve).

If you add up six such random variables, you get a random variable with a probability density that is practically no different from the normal one.

Now we have the opportunity to prove local and integral theorems of Moivre - Laplace(see paragraph 2.3).

Consider the random variable - number of occurrences of the event in n independent trials, in each of which it can appear with the same probability p, i.e. X = T - a random variable having a binomial distribution law for which the mathematical expectation M(X) = pr and variance O(X) = pr.

The random variable 7, just like the random variable X, is, generally speaking, discrete, but for a large number n tests, its values ​​are located on the abscissa axis so closely that it can be considered as continuous with the probability density ср(х).

Let's find the numerical characteristics of the random variable 7 using the properties of mathematical expectation and dispersion:

Due to the fact that the random variable X is the sum of independent alternative random variables (see paragraph 4.1), the random variable 2 represents also the sum of independent, identically distributed random variables and, therefore, based on the central limit theorem for a large number n has a distribution close to the normal law with parameters a = 0, with 2 = 1. Using property (4.32) of the normal law, taking into account equalities (4.33), we obtain

Believing , taking into account what we get,

that the double inequality in parentheses is equivalent to the inequality aAs a result, from formula (6.22) we obtain integral formula of Moivre - Laplace (2.10):

Probability R t p that the event A will happen T once every n independent tests, can be approximately written in the form

The less At, the more accurate the approximate equality. Minimum (integer) At - 1. Therefore, taking into account formulas (6.23) and (6.22), we can write:

Where

For small Dg we have

where f(g) is the density of a standard normally distributed random variable with parameters a = 0, and 2 = 1, i.e.

Assuming from the formula

(6.25) taking into account equality (6.24) we obtain local Moivre - Laplace formula (2.7):

Comment. Some caution must be exercised when applying the central limit theorem in statistical research. So, if the amount at n -> oo always has a normal law

distribution, then the rate of convergence to it depends significantly on the type of distribution of its terms. So, for example, as noted above, when summing uniformly distributed random variables, already with 6-10 terms one can achieve sufficient closeness to the normal law, while to achieve the same proximity when summing x 2 -distributed random terms, more than 100 terms will be needed.

Based on the central limit theorem, it can be argued that those considered in Chap. 4 random variables having distribution laws - binomial, Poisson, hypergeometric, y)(“chi-square”), b(Student's test), at n -> oo are distributed asymptotically normally.

What is the average weight of a person?

The basic idea of ​​statistics is that you can tell something about a population as a whole by finding it out for a smaller group of people. Without this idea, there would be no opinion polls or election forecasts, no opportunity to test new drugs or study the safety of bridges, etc. The central limit is largely responsible for the fact that we can do all of this and reduce the uncertainty of forecasts. theorem.

To understand how the theorem works, imagine that you need to find out the average weight of a UK resident. You go out and measure the weights of, say, a hundred randomly selected people, and find the average person weight for that group—let's call it the sample mean. The sample average should now give a fairly accurate representation of the national average. But what if your sample included only overweight people or, conversely, only very thin people?

To get an idea of ​​how typical the resulting average would be, you need to know how the average weight of a sample of 100 people varies across populations: if you took very many groups of 100 people and found the average weight for each group, how much would the found numbers? And how much will its average (average of averages) coincide with the true average weight of a person in the population?

For example, suppose that if you selected very many groups of 100 people and recorded the average weight of each group, you would get all values ​​from 10 kg to 300 kg in equal quantities. Then your method of estimating the grand average from one sample of 100 people is not very good because there is too much spread of values ​​- you can get any of the possible values, so you can't tell which one is closest to the true average weight in the population.

Examples of normal distributions with different means and variances.

So how can we say anything about the distribution of the average masses of 100 people—called the sample distribution—when we know nothing about the distribution of the masses of the entire population? This is the central limit theorem: it states that for a large enough sample, the sampling distribution can be approximated by a normal distribution—a distribution that has the famous bell shape. (A sample size of 30 is generally considered to be good enough.)

The mean of this normal distribution (the mean of the means corresponding to the top of the bell) is the same as the mean of the entire population (the mean weight of the population). The variance of this normal distribution, that is, how much the weight deviates from the mean (determined by the width of the bell), depends on the sample size: the larger the sample, the smaller the variance. There is an equation that gives the exact ratio.

So if your sample size is large enough (100 would of course be fine, since it's larger than 30), then the relatively small variance of the normal sampling distribution means that the average weight you observe is close to the mean of that normal distribution (since the bell is quite narrow). And since the mean of this normal distribution is equal to the true mean weight in the entire population, the observed mean is a good approximation of the true mean.
You can do all of this precisely, for example you can say how confident you are that the true mean is away from the sample mean, and you can also use the result to calculate how large a sample you need to get an estimate to a certain power accuracy. It is the central limit theorem that is responsible for the accuracy of statistical inference, and it is behind the widespread use of the normal distribution.
In fact, the central limit theorem is a little more general than presented here. Here is its exact wording.

Theorem. Let be an infinite sequence of independent identically distributed random variables having a finite mathematical expectation and variance. Let . Then

by distribution at .

By symbolizing the sample average of the first values, that is , we can rewrite the result of the central limit theorem as follows.

Many TV problems are related to the study of the sum of independent random variables, which, under certain conditions, has a distribution close to normal. These conditions are expressed by the central limit theorem (CLT).

Let ξ 1, ξ 2, …, ξ n, … be a sequence of independent random variables. Let's denote

n η = ξ 1 + ξ 2 +…+ ξ n. They say that the CTP is applicable to the sequence ξ 1, ξ 2, ..., ξ n, ...

if as n → ∞ the distribution law η n tends to normal:

The essence of the CLT: with an unlimited increase in the number of random variables, the distribution law of their sum tends to normal.

Lyapunov's central limit theorem

The law of large numbers does not examine the form of the limit law of distribution of a sum of random variables. This question is considered in a group of theorems called central limit theorem. They argue that the law of distribution of a sum of random variables, each of which can have different distributions, approaches normal when the number of terms is sufficiently large. This explains the importance of the normal law for practical applications.

Characteristic functions.

To prove the central limit theorem, the method of characteristic functions is used.

Definition 14.1.Characteristic function random variable X called function

g(t) = M (e itX) (14.1)

Thus, g (t) represents the mathematical expectation of some complex random variable U = e itX, associated with the value X. In particular, if X is a discrete random variable specified by a distribution series, then

. (14.2)

For a continuous random variable with distribution density f(x)

(14.3)

Example 1. Let X– the number of 6 points obtained with one throw of the dice. Then according to formula (14.2) g(t) =

Example 2. Find the characteristic function for a normalized continuous random variable distributed according to the normal law . According to formula (14.3) (we used the formula and what i² = -1).

Properties of characteristic functions.

1. Function f(x) can be found using the known function g(t) according to the formula

(14.4)

(transformation (14.3) is called Fourier transform, and transformation (14.4) – inverse Fourier transform).

2. If random variables X And Y related by the relation Y = aX, then their characteristic functions are related by the relation

g y (t) = g x (at). (14.5)

3. The characteristic function of the sum of independent random variables is equal to the product of the characteristic functions of the terms: for

Theorem 14.1 (central limit theorem for identically distributed terms). If X 1 , X 2 ,…, X p,… - independent random variables with the same distribution law, mathematical expectation T and variance σ 2, then with unlimited increase n the law of distribution of the amount indefinitely approaches normal.


Proof.

Let us prove the theorem for continuous random variables X 1 , X 2 ,…, X p(the proof for discrete quantities is similar). According to the conditions of the theorem, the characteristic functions of the terms are identical: Then, by property 3, the characteristic function of the sum Y n will be Expand the function g x(t) in the Maclaurin series:

, where at .

Assuming that T= 0 (that is, move the origin to the point T), That .

(because T= 0). Substituting the results obtained into the Maclaurin formula, we find that

.

Consider a new random variable different from Y n in that its dispersion for any n equals 0. Since Y n And Z n are related by a linear relationship, it is enough to prove that Z n distributed according to a normal law, or, which is the same thing, that its characteristic function approaches the characteristic function of a normal law (see example 2). By the property of characteristic functions

Let us take the logarithm of the resulting expression:

Where

Let's put it in a row at n→ ∞, limiting ourselves to two terms of the expansion, then ln(1 - k) ≈ - k.

Where the last limit is 0, since at . Hence, , that is - characteristic function of normal distribution. So, with an unlimited increase in the number of terms, the characteristic function of the quantity Z n unlimitedly approaches the characteristic function of the normal law; therefore, the distribution law Z n(And Y n) approaches normal without limit. The theorem has been proven.

A.M. Lyapunov proved the central limit theorem for conditions of a more general form:

Theorem 14.2 (Lyapunov's theorem). If the random variable X is the sum of a very large number of mutually independent random variables for which the following condition is satisfied:

Where b k– third absolute central moment of magnitude X k, A Dk is its variance, then X has a distribution close to normal (Lyapunov’s condition means that the influence of each term on the sum is negligible).

In practice, it is possible to use the central limit theorem with a sufficiently small number of terms, since probabilistic calculations require relatively low accuracy. Experience shows that for a sum of even ten or fewer terms, the law of their distribution can be replaced by a normal one.

Since many random variables in applications are formed under the influence of several weakly dependent random factors, their distribution is considered normal. In this case, the condition must be met that none of the factors is dominant. Central limit theorems in these cases justify the use of the normal distribution.

Encyclopedic YouTube

  • 1 / 5

    Let there be an infinite sequence of independent identically distributed random variables having finite expectation and variance. Let us denote the latter μ (\displaystyle \mu ) And σ 2 (\displaystyle \sigma ^(2)), respectively. Let also

    . S n − μ n σ n → N (0 , 1) (\displaystyle (\frac (S_(n)-\mu n)(\sigma (\sqrt (n))))\to N(0,1) ) by distribution at ,

    Where N (0 , 1) (\displaystyle N(0,1))- normal distribution with zero mathematical expectation and standard deviation equal to one. By symbolizing the sample mean of the first n (\displaystyle n) quantities, that is X ¯ n = 1 n ∑ i = 1 n X i (\displaystyle (\bar (X))_(n)=(\frac (1)(n))\sum \limits _(i=1)^( n)X_(i)), we can rewrite the result of the central limit theorem as follows:

    n X ¯ n − μ σ → N (0 , 1) (\displaystyle (\sqrt (n))(\frac ((\bar (X))_(n)-\mu )(\sigma ))\to N(0,1)) by distribution at n → ∞ (\displaystyle n\to \infty ).

    The rate of convergence can be estimated using the Berry-Esseen inequality.

    Notes

    • Informally speaking, the classical central limit theorem states that the sum n (\displaystyle n) independent identically distributed random variables has a distribution close to N (n μ , n σ 2) (\displaystyle N(n\mu ,n\sigma ^(2))). Equivalently, X ¯ n (\displaystyle (\bar (X))_(n)) has a distribution close to N (μ , σ 2 / n) (\displaystyle N(\mu ,\sigma ^(2)/n)).
    • Since the distribution function of the standard normal distribution is continuous, convergence to this distribution is equivalent to the pointwise convergence of distribution functions to the distribution function of the standard normal distribution. Putting Z n = S n − μ n σ n (\displaystyle Z_(n)=(\frac (S_(n)-\mu n)(\sigma (\sqrt (n))))), we get F Z n (x) → Φ (x) , ∀ x ∈ R (\displaystyle F_(Z_(n))(x)\to \Phi (x),\;\forall x\in \mathbb (R) ), Where Φ (x) (\displaystyle \Phi (x))- distribution function of the standard normal distribution.
    • The central limit theorem in the classical formulation is proved by the method of characteristic functions (Levi's continuity theorem).
    • Generally speaking, the convergence of distribution functions does not imply the convergence of densities. Nevertheless, in this classic case this is the case.

    Local C.P.T.

    Under the assumptions of the classical formulation, let us assume in addition that the distribution of random variables ( X i ) i = 1 ∞ (\displaystyle \(X_(i)\)_(i=1)^(\infty )) absolutely continuous, that is, it has density. Then the distribution is also absolutely continuous, and moreover,

    f Z n (x) → 1 2 π e − x 2 2 (\displaystyle f_(Z_(n))(x)\to (\frac (1)(\sqrt (2\pi )))\,e^ (-(\frac (x^(2))(2)))) at n → ∞ (\displaystyle n\to \infty ),

    Where f Z n (x) (\displaystyle f_(Z_(n))(x))- density of a random variable Z n (\displaystyle Z_(n)), and on the right side is the density of the standard normal distribution.

    Generalizations

    The result of the classical central limit theorem is valid for situations much more general than complete independence and equal distribution.

    C. P. T. Lindeberg

    Let independent random variables X 1 , … , X n , … (\displaystyle X_(1),\ldots ,X_(n),\ldots ) are defined on the same probability space and have finite expectations and variances: E [ X i ] = μ i , D [ X i ] = σ i 2 (\displaystyle \mathbb (E) =\mu _(i),\;\mathrm (D) =\sigma _(i)^( 2)).

    Let S n = ∑ i = 1 n X i (\displaystyle S_(n)=\sum \limits _(i=1)^(n)X_(i)).

    Then E [ S n ] = m n = ∑ i = 1 n μ i , D [ S n ] = s n 2 = ∑ i = 1 n σ i 2 (\displaystyle \mathbb (E) =m_(n)=\sum \ limits _(i=1)^(n)\mu _(i),\;\mathrm (D) =s_(n)^(2)=\sum \limits _(i=1)^(n)\ sigma_(i)^(2)).

    And let it be done Lindeberg condition:

    ∀ ε > 0 , lim n → ∞ ∑ i = 1 n E [ (X i − μ i) 2 s n 2 1 ( | X i − μ i | > ε s n ) ] = 0 , (\displaystyle \forall \varepsilon >0,\;\lim \limits _(n\to \infty )\sum \limits _(i=1)^(n)\mathbb (E) \left[(\frac ((X_(i)-\ mu _(i))^(2))(s_(n)^(2)))\,\mathbf (1) _(\(|X_(i)-\mu _(i)|>\varepsilon s_ (n)\))\right]=0,)

    Where 1 ( | X i − μ i | > ε s n ) (\displaystyle \mathbf (1) _(\(|X_(i)-\mu _(i)|>\varepsilon s_(n)\))) function - indicator.

    by distribution at n → ∞ (\displaystyle n\to \infty ).

    Ts. P. T. Lyapunova

    Let the basic assumptions of C. P. T. Lindeberg be satisfied. Let the random variables ( X i ) (\displaystyle \(X_(i)\)) have a finite third moment. Then the sequence is defined

    r n 3 = ∑ i = 1 n E [ | X i − μ i | 3 ] (\displaystyle r_(n)^(3)=\sum _(i=1)^(n)\mathbb (E) \left[|X_(i)-\mu _(i)|^(3 )\right]).

    If the limit

    lim n → ∞ r n s n = 0 (\displaystyle \lim \limits _(n\to \infty )(\frac (r_(n))(s_(n)))=0) (Lyapunov condition), S n − m n s n → N (0 , 1) (\displaystyle (\frac (S_(n)-m_(n))(s_(n)))\to N(0,1)) by distribution at n → ∞ (\displaystyle n\to \infty ).

    C.P.T. for martingales

    Let the process (X n) n ∈ N (\displaystyle (X_(n))_(n\in \mathbb (N) )) is a martingale with limited increments. In particular, let us assume that

    E [ X n + 1 − X n ∣ X 1 , … , X n ] = 0 , n ∈ N , X 0 ≡ 0 , (\displaystyle \mathbb (E) \left=0,\;n\in \mathbb (N) ,\;X_(0)\equiv 0,)

    and the increments are uniformly limited, that is

    ∃ C > 0 ∀ n ∈ N | X n + 1 − X n | ≤ C (\displaystyle \exists C>0\,\forall n\in \mathbb (N) \;|X_(n+1)-X_(n)|\leq C) τ n = min ( k | ∑ i = 1 k σ i 2 ≥ n ) (\displaystyle \tau _(n)=\min \left\(k\left\vert \;\sum _(i=1)^ (k)\sigma _(i)^(2)\geq n\right.\right\)). X τ n n → N (0 , 1) (\displaystyle (\frac (X_(\tau _(n)))(\sqrt (n)))\to N(0,1)) by distribution at n → ∞ (\displaystyle n\to \infty ).

    Limit theorems of probability theory

    Chebyshev's inequality

    Let us consider a number of statements and theorems from a large group of so-called limit theorems of probability theory that establish a connection between the theoretical and experimental characteristics of random variables with a large number of tests on them. They form the basis of mathematical statistics. Limit theorems are conventionally divided into two groups. The first group of theorems, called law of large numbers, establishes the stability of average values, i.e. with a large number of tests, their average result ceases to be random and can be predicted with sufficient accuracy. The second group of theorems, called central limit, establishes the conditions under which the law of distribution of the sum of a large number of random variables indefinitely approaches normal.

    First, let's consider Chebyshev's inequality, which can be used to: a) roughly estimate the probabilities of events associated with random variables whose distribution is unknown; b) proofs of a number of theorems of the law of large numbers.

    Theorem 7.1. If the random variable X has mathematical expectation and variance DX, then for any Chebyshev inequality is valid

    . (7.1)

    Note that Chebyshev’s inequality can be written in another form:

    For frequencies or events in n independent trials, in each of which it can occur with probability , whose variance , Chebyshev’s inequality has the form

    Inequality (7.5) can be rewritten as

    . (7.6)

    Example 7.1. Using Chebyshev’s inequality, estimate the probability that the deviation of a random variable X from its mathematical expectation will be less than three standard deviations, i.e. less .

    Solution:

    Assuming in formula (7.2), we obtain

    This estimate is called three sigma rule.

    Chebyshev's theorem

    The main statement of the law of large numbers is contained in Chebyshev's theorem. It and other theorems of the law of large numbers use the concept of “convergence of random variables in probability.”

    Random variables converge in probability to the value A (random or non-random), if for any the probability of an event at tends to unity, i.e.

    (or ). Convergence in probability is symbolically written as follows:

    It should be noted that convergence in probability requires that the inequality be satisfied for the vast majority of members sequences (in mathematical analysis - for all n>N, Where N- a certain number), and when almost all members of the sequence must fall into ε- neighborhood A.

    Theorem 7.3 (Law of large numbers in the form of P.L. Chebyshev). If random variables are independent and there is such a number C> 0, what, then for anyone

    , (7.7)

    those. the arithmetic mean of these random variables converges in probability to the arithmetic mean of their mathematical expectations:

    .

    Proof. Since then

    .

    Then, applying Chebyshev’s inequality (7.2) to the random variable, we have

    those. the arithmetic mean of random variables converges in probability to the mathematical expectation A:

    Proof. Because

    and the variances of random variables , i.e., are limited, then applying Chebyshev’s theorem (7.7), we obtain statement (7.9).

    A corollary of Chebyshev's theorem substantiates the principle of the “arithmetic mean” of random variables Xi, constantly used in practice. So, let it be produced n independent measurements of some quantity, the true value of which A(it is unknown). The result of each measurement is a random variable Xi. According to the corollary, as an approximate value of the quantity A you can take the arithmetic average of the measurement results:

    .

    The equality is more accurate, the more n.

    The Chebyshev theorem is also the basis for the method widely used in statistics. sampling method, the essence of which is that the quality of a large amount of homogeneous material can be judged by its small sample.

    Chebyshev's theorem confirms the connection between randomness and necessity: the average value of a random variable is practically no different from a non-random variable.

    Bernoulli's theorem

    Bernoulli's theorem is historically the first and simplest form of the law of large numbers. It theoretically substantiates the property of relative frequency stability.

    Theorem 7.4 (Law of large numbers in J. Bernoulli form). If the probability of an event occurring A in one test is equal to r, the number of occurrence of this event at n independent tests is equal to , then for any number the equality holds

    , (7.10)

    i.e. relative frequency of the event A converges in probability to probability r events A: .

    Proof. Let us introduce random variables as follows: if in i-an event appeared during the test A, and if it doesn’t appear, then . Then the number A(number of successes) can be represented as

    The mathematical expectation and dispersion of random variables are equal to: , . The distribution law of random variables X i has the form

    Xi
    R r

    at any i. Thus, random variables X i are independent, their variances are limited to the same number, since

    .

    Therefore, Chebyshev’s theorem can be applied to these random variables

    .

    ,

    Hence, .

    Bernoulli's theorem theoretically justifies the possibility of approximate calculation of the probability of an event using its relative frequency. So, for example, the probability of having a girl can be taken as the relative frequency of this event, which, according to statistical data, is approximately equal to 0.485.

    Chebyshev's inequality (7.2) for random variables

    takes the form

    Where p i- probability of event A V i- m test.

    Example 7.2. The probability of a typo on one page of a manuscript is 0.2. Estimate the probability that in a manuscript containing 400 pages, the frequency of a typo differs from the corresponding probability modulo less than 0.05.

    Solution:

    Let's use formula (7.11). In this case, , , . We have, i.e. .

    Central limit theorem

    The central limit theorem is the second group of limit theorems that establish a connection between the law of distribution of the sum of a random variable and its limiting form - the normal distribution law.

    Let us formulate the central limit theorem for the case when the terms of the sum have the same distribution. This theorem is used most often in practice. In mathematical statistics, sample random variables have identical distributions because they are obtained from the same population.

    Theorem 7.5. Let random variables be independent, identically distributed, and have finite mathematical expectation and variance , . Then the distribution function of the centered and normalized sum of these random variables tends to the distribution function of the standard normal random variable.