Binomial distribution of a discrete random variable. Probability distribution

22.09.2019

Section 6. Typical distribution laws and numerical characteristics of random variables

The form of functions F(x), p(x), or the enumeration p(x i) is called the distribution law random variable. Although we can imagine an infinite variety of random variables, there are far fewer laws of distribution. Firstly, different random variables can have exactly the same distribution laws. For example: let y take only 2 values ​​1 and -1 with probabilities 0.5; the value z = -y has exactly the same distribution law.
Secondly, very often random variables have similar laws distributions, i.e., for example, p(x) for them is expressed by formulas of the same form, differing only in one or more constants. These constants are called distribution parameters.

Although in principle the most possible different laws distribution, several of the most typical laws will be considered here. It is important to pay attention to the conditions under which they arise, the parameters and properties of these distributions.

1 . Uniform distribution
This is the name given to the distribution of a random variable that can take any values ​​in the interval (a,b), and the probability of it falling into any segment inside (a,b) is proportional to the length of the segment and does not depend on its position, and the probability of values ​​outside (a,b ) is equal to 0.


Fig 6.1 Uniform distribution function and density

Distribution parameters: a, b

2. Normal distribution
Distribution with density described by the formula

(6.1)

called normal.
Distribution parameters: a, σ


Figure 6.2 Typical density and normal distribution function

3. Bernoulli distribution
If a series of independent trials is carried out, in each of which the event A can appear with the same probability p, then the number of occurrences of the event is a random variable distributed according to Bernoulli's law, or according to the binomial law (another name for distribution).

Here n is the number of trials in the series, m is a random variable (the number of occurrences of event A), P n (m) is the probability that A will occur exactly m times, q = 1 - p (the probability that A will not appear in the trial ).

Example 1: A die is rolled 5 times, what is the probability that a 6 will be rolled twice?
n = 5, m = 2, p = 1/6, q = 5/6

Distribution parameters: n, p

4 . Poisson distribution
The Poisson distribution is obtained as a limiting case of the Bernoulli distribution, if p tends to zero and n to infinity, but so that their product remains constant: nр = а. Formally, such a passage to the limit leads to the formula

Distribution parameter: a

Many random variables found in science and practical life are subject to the Poisson distribution.

Example 2: number of calls received at an ambulance station within an hour.
Let us divide the time interval T (1 hour) into small intervals dt, such that the probability of receiving two or more calls during dt is negligible, and the probability of one call p is proportional to dt: p = μdt;
we will consider observation during moments dt as independent trials, the number of such trials during time T: n = T / dt;
if we assume that the probabilities of call arrivals do not change during the hour, then the total number of calls obeys Bernoulli’s law with the parameters: n = T / dt, p = μdt. Having directed dt to zero, we find that n tends to infinity, and the product n×р remains constant: a = n×р = μT.

Example 3: number of molecules ideal gas in some fixed volume V.
Let us divide the volume V into small volumes dV such that the probability of finding two or more molecules in dV is negligible, and the probability of finding one molecule is proportional to dV: p = μdV; we will consider the observation of each volume dV as an independent test, the number of such tests n=V/dV; if we assume that the probabilities of finding a molecule anywhere inside V are the same, the total number of molecules in volume V obeys Bernoulli's law with the parameters: n = V / dV, p = μdV. Having directed dV to zero, we find that n tends to infinity, and the product n×р remains constant: a = n×р =μV.

Numerical characteristics of random variables

1 . Mathematical expectation (average value)

Definition:
The mathematical expectation is called
  (6.4)

The sum is taken over all the values ​​that the random variable takes. The series must be absolutely convergent (otherwise the random variable is said to have no mathematical expectation)

;   (6.5)

The integral must be absolutely convergent (otherwise the random variable is said to have no mathematical expectation)


Properties of mathematical expectation:

a. If with - constant, then MC = C
b. MCx = CMx
c. The mathematical expectation of the sum of random variables is always equal to the sum of their mathematical expectations: M(x+y) = Mx + My d. The concept of conditional mathematical expectation is introduced. If a random variable takes its values ​​x i with different probabilities p(x i /H j) under different conditions H j, then the conditional mathematical expectation is determined

How or ;   (6.6)

If the probabilities of events H j are known, the complete

expected value: ;   (6.7)

Example 4: On average, how many times must a coin be tossed before the first symbol appears? This problem can be solved head-on

x i 1 2 3 ... k..
p(x i) :  ,

but this amount still needs to be calculated. You can do it more simply by using the concepts of conditional and complete mathematical expectation. Let's consider hypotheses H 1 - the coat of arms fell out the first time, H 2 - the coat of arms did not fall out the first time. Obviously, p(H 1) = p(H 2) = ½; Mx / N 1 = 1;
Mx / N 2 is 1 more than the desired full expectation, because after the first toss of the coin the situation has not changed, but it has already been tossed once. Using the formula for the total mathematical expectation, we have Мх = Мx / Н 1 ×р(Н 1) + Мx / Н 2 ×р(Н 2) = 1 × 0.5 + (Мх + 1) × 0.5, solving the equation for Мх, we immediately obtain Mx = 2.

e. If f(x) is a function of a random variable x, then the concept of mathematical expectation of a function of a random variable is defined:

For a discrete random variable: ;   (6.8)

The sum is taken over all the values ​​that the random variable takes. The series must be absolutely convergent.

For a continuous random variable: ;   (6.9)

The integral must be absolutely convergent.

2. Variance of a random variable
Definition:
The variance of a random variable x is the mathematical expectation of the squared deviation of the value of the value from its mathematical expectation: Dx = M(x-Mx) 2

For a discrete random variable: ;   (6.10)

The sum is taken over all the values ​​that the random variable takes. The series must be convergent (otherwise the random variable is said to have no variance)

For a continuous random variable: ;   (6.11)

The integral must be convergent (otherwise the random variable is said to have no variance)

Dispersion properties:
a. If C is a constant value, then DC = 0
b. DСх = С 2 Dх
c. The variance of the sum of random variables is always equal to the sum of their variances only if these values ​​are independent (definition of independent variables)
d. To calculate the variance it is convenient to use the formula:

Dx = Mx 2 - (Mx) 2 (6.12)

Relationship between numerical characteristics
and parameters of typical distributions

distributionoptionsformulaMxDx
uniforma, b (b+a) / 2(b-a) 2 / 12
normala, σ aσ 2
Bernoullin,p n.p.npq
Poissona aa

Probability distribution is a probability measure on a measurable space.

Let W be a non-empty set of arbitrary nature and Ƒ -s- algebra on W, that is, a collection of subsets of W containing W itself, the empty set Æ, and closed under at most a countable set of set-theoretic operations (this means that for any A Î Ƒ set = W\ A belongs again Ƒ and if A 1 , A 2 ,…О Ƒ , That Ƒ And Ƒ ). Pair (W, Ƒ ) is called a measurable space. Non-negative function P( A), defined for everyone A Î Ƒ , is called a probability measure, probability, P. probabilities or simply P., if P(W) = 1 and P is countably additive, that is, for any sequence A 1 , A 2 ,…О Ƒ such that A i A j= Æ for all i ¹ j, the equality P() = P( A i). Three (W, Ƒ , P) is called a probability space. Probability space is the original concept of the axiomatic probability theory proposed by A.N. Kolmogorov in the early 1930s.

On every probability space one can consider (real) measurable functions X = X(w), wÎW, that is, functions such that (w: X(w)О B} Î Ƒ for any Borel subset B real line R. Measurability of a function X is equivalent to (w: X(w)< x} Î Ƒ for any real x. Measurable functions are called random variables. Each random variable X, defined on the probability space (W, Ƒ , P), generates P. probabilities

P X (B) = P( XÎ B) = P((w: X(w)О B}), B Î Ɓ ,
on measurable space ( R, Ɓ ), Where Ɓ R, and the distribution function

F X(x) = P( X < x) = P((w: X(w)< x}), -¥ < x <¥,
which are called probability probability and distribution function of a random variable X.

Distribution function F any random variable has the properties

1. F(x) non-decreasing,

2. F(- ¥) = 0, F(¥) = 1,

3. F(x) is left continuous at every point x.

Sometimes in the definition of the distribution function the inequality< заменяется неравенством £; в этом случае функция распределения является непрерывной справа. В содержательных утверждениях теории вероятностей не важно, непрерывна функция распределения слева или справа, важны лишь положения ее точек разрыва x(if any) and increment sizes F(x+0) - F(x-0) at these points; If F X, then this increment is P( X = x).

Any function F, having properties 1. - 3. is called a distribution function. Correspondence between distributions on ( R, Ɓ ) and distribution functions are one-to-one. For any R. P on ( R, Ɓ ) its distribution function is determined by the equality F(x) = P((-¥, x)), -¥ < x <¥, а для любой функции распределения F corresponding to it R. P is defined on the algebra £ of sets consisting of unions of a finite number of disjoint intervals function F 1 (x) increases linearly from 0 to 1. To construct the function F 2 (x) the segment is divided into segment , interval (1/3, 2/3) and segment . Function F 2 (x) on the interval (1/3, 2/3) is equal to 1/2 and increases linearly from 0 to 1/2 and from 1/2 to 1 on the segments and, respectively. This process continues and the function Fn+1 is obtained using the following function transformation Fn, n³ 2. On intervals where the function Fn(x) is constant, Fn +1 (x) coincides with Fn(x). Each segment where the function Fn(x) increases linearly from a before b, is divided into segment , interval (a + (a - b)/3, a + 2(b - a)/3) and segment . At the specified interval Fn +1 (x) is equal to ( a + b)/2, and on the indicated segments Fn +1 (x) increases linearly from a before ( a + b)/2and from ( a + b)/2 to b respectively. For each 0 £ x£1 sequence Fn(x), n= 1, 2,..., converges to some number F(x). Sequence of distribution functions Fn, n= 1, 2,..., is equicontinuous, therefore the limit distribution function F(x) is continuous. This function is constant on a countable set of intervals (the values ​​of the function are different on different intervals), on which there are no growth points, and the total length of these intervals is 1. Therefore, the Lebesgue measure of the set supp F is equal to zero, that is F singular.

Each distribution function can be represented as

F(x) = p ac F ac ( x) + p d F d ( x) + p s F s ( x),
Where F ac, F d and F s is absolutely continuous, discrete and singular distribution functions, and the sum of non-negative numbers p ac, p d and p s is equal to one. This representation is called the Lebesgue expansion, and the functions F ac, F d and F s - components of decomposition.

The distribution function is called symmetric if F(-x) = 1 - F(x+ 0) for
x> 0. If a symmetric distribution function is absolutely continuous, then its density is an even function. If the random variable X has a symmetric distribution, then the random variables X And - X equally distributed. If the symmetric distribution function F(x) is continuous at zero, then F(0) = 1/2.

Among the absolutely continuous rules often used in probability theory are uniform rules, normal rules (Gauss rules), exponential rules, and Cauchy rules.

R. is called uniform on the interval ( a, b) (or on the segment [ a, b], or at intervals [ a, b) And ( a, b]), if its density is constant (and equal to 1/( b - a)) on ( a, b) and is equal to zero outside ( a, b). Most often, a uniform distribution on (0, 1) is used, its distribution function F(x) is equal to zero at x£ 0, equal to one at x>1 and F(x) = x at 0< x£ 1. A uniform random variable on (0, 1) has X(w) = w on a probability space consisting of the interval (0, 1), a set of Borel subsets of this interval and the Lebesgue measure. This probability space corresponds to the experiment “throwing a point w at random onto the interval (0, 1)”, where the word “at random” means equality (“equal opportunity”) of all points from (0, 1). If on the probability space (W, Ƒ , P) there is a random variable X with a uniform distribution on (0, 1), then on it for any distribution function F there is a random variable Y, for which the distribution function F Y coincides with F. For example, the distribution function of a random variable Y = F -1 (X) coincides with F. Here F -1 (y) = inf( x: F(x) > y}, 0 < y < 1; если функция F(x) is continuous and strictly monotone on the entire real line, then F-1 - inverse function F.

Normal R. with parameters ( a, s 2), -¥< a < ¥, s 2 >0, called R. with density, -¥< x < ¥. Чаще всего используется нормальное Р. с параметрами a= 0 and s 2 = 1, which is called standard normal R., its distribution function F( x) is not expressed through superpositions of elementary functions and we have to use its integral representation F( x) =, -¥ < x < ¥. Для фунции распределения F(x) detailed tables were compiled that were necessary before modern computing technology appeared (values ​​of the function F( x) can also be obtained using special tables. functions erf( x)), values ​​F( x) For x> 0 can be obtained using the sum of the series

,
and for x < 0 можно воспользоваться симметричностью F(x). Normal distribution function values ​​with parameters a and s 2 can be obtained using the fact that it coincides with F(( x - a)/s). If X 1 and X 2 independent normally distributed with parameters a 1 , s 1 2 and a 2 , s 2 2 random variables, then the distribution of their sum X 1 + X 2 is also ok with parameters a= a 1 + a 2 and s 2 = s 1 2 + s 2 2 . The statement is also true, in a sense, the opposite: if a random variable X normally distributed with parameters a and s 2 and
X = X 1 + X 2 where X 1 and X 2 are independent random variables other than constants, then X 1 and X 2 have normal distributions (Cramer’s theorem). Options a 1 , s 1 2 and a 2 , s 2 2 distributions of normal random variables X 1 and X 2 related to a and s 2 by the equalities given above. The standard normal distribution is a limit in the central limit theorem.

Exponential distribution is a distribution with density p(x) = 0 at x < 0 и p(x) = l e-l x at x³ 0, where l > 0 is a parameter, its distribution function F(x) = 0 at x£0 and F(x) = 1 - e-l x at x> 0 (sometimes exponential parameters are used, which differ from the indicated one by a shift along the real axis). This R. has a property called the absence of aftereffect: if X is a random variable with exponential R., then for any positive x And t

P( X > x + t | X > x) = P( X > t).
If X is the operating time of some device before failure, then the absence of aftereffect means that the probability that the device, turned on at time 0, will not fail until x + t provided that he did not refuse until the moment x, does not depend on x. This property is interpreted as the absence of "aging". The absence of aftereffect is a characteristic property of exponential distribution: in the class of absolutely continuous distributions, the above equality is valid only for exponential distribution (with some parameter l > 0). Exponential R. appears as a limit R. in the minimum scheme. Let X 1 , X 2 ,… - non-negative independent identically distributed random variables and for their common distribution function F point 0 is the growth point. Then at n®¥ distributions of random variables Yn= min( X 1 ,…, Xn) weakly converge to a degenerate distribution with a single growth point 0 (this is an analogue of the law large numbers). If we additionally assume that for some e > 0 the distribution function F(x) on the interval (0, e) admits representation and p(u)®l at u¯ 0, then the distribution functions of random variables Z n = n min( X 1 ,…, Xn) at n®¥ evenly across -¥< x < ¥ сходятся к экспоненциальной функции распределения с параметром l (это - аналог центральной предельной теоремы).

R. Cauchy is called R. with density p(x) = 1/(p(1 + x 2)), -¥< x < ¥, его функция рас-пределения F(x) = (arctg x+ p/2)/p. This R. appeared in the work of S. Poisson in 1832 in connection with the solution of the following problem: are there independent identically distributed random variables? X 1 , X 2 ,... such that the arithmetic means ( X 1 + … + Xn)/n at every n have the same R. as each of the random variables X 1 , X 2 ,...? S. Poisson discovered that random variables with the specified density have this property. For these random variables, the statement of the law of large numbers does not hold, in which arithmetic means ( X 1 +…+ Xn)/n with growth n degenerate. However, this does not contradict the law of large numbers, since it imposes restrictions on the distributions of the original random variables that are not satisfied for the specified distribution (for this distribution there are absolute moments of all positive orders less than unity, but the mathematical expectation does not exist) . In the works of O. Cauchy, R., bearing his name, appeared in 1853. R. Cauchy is related X/Y independent random variables with standard normal P.

Among the discrete variables often used in probability theory are R. Bernoulli, binomial R. and R. Poisson.

R. Bernoulli calls any distribution with two growth points. The most commonly used random variable is R. X, taking values ​​0 and 1 with probabilities
q = 1 - p And p respectively, where 0< p < 1 - параметр. Первые формы закона больших чисел и центральной предельной теоремы были получены для случайных величин, имею-щих Р. Бернулли. Если на вероятностном пространстве (W, Ƒ , P) there is a sequence X 1 , X 2,... independent random variables taking values ​​0 and 1 with probabilities of 1/2 each, then on this probability space there exists a random variable with uniform R on (0, 1). In particular, the random variable has a uniform distribution on (0, 1).

Binomial R. with parameters n And p, n- natural, 0< p < 1, называется Р., с точками роста 0, 1,..., n, in which the probabilities are concentrated C n k p k q n-k, k = 0, 1,…, n,
q = 1 - p. It is R. amount n independent random variables having R. Bernoulli with growth points 0 and 1, in which the probabilities are concentrated q And p. The study of this distribution led J. Bernoulli to the discovery of the law of large numbers, and A. Moivre to the discovery of the central limit theorem.

A Poisson formula is called a formula whose support is a sequence of points 0, 1,..., in which the probabilities l are concentrated k e-l/ k!, k= 0, 1,…, where l > 0 is a parameter. The sum of two independent random variables having a R. Poisson with parameters l and m again has a R. Poisson with the parameter l + m. R. Poisson is the limit for R. Bernoulli with parameters n And p = p(n) at n®¥ if n And p related by the relation n.p.®l at n®¥ (Poisson's theorem). If the sequence is 0< T 1 < T 2 < T 3 <… есть последовательность моментов времени, в которые происходят некоторые события (так. наз поток событий) и величины T 1 , T 2 -T 1 , T 3 - T 2 ,... are independent identically distributed random variables and their common R. is exponential with parameter l > 0, then the random variable X t, equal to the number of events that occurred in the interval (0, t), has R. Poisson with parameter.l t(such a flow is called Poisson).

The concept of R has numerous generalizations; in particular, it extends to the multidimensional case and to algebraic structures.

Despite their exotic names, common distributions relate to each other in intuitive and interesting ways that make them easy to remember and reason about with confidence. Some follow naturally, for example, from the Bernoulli distribution. Time to show a map of these connections.

Each distribution is illustrated by an example of its distribution density function (DFF). This article is only about those distributions whose outcomes are single numbers. Therefore, the horizontal axis of each graph is a set of possible outcome numbers. Vertical – the probability of each outcome. Some distributions are discrete - their outcomes must be integers, such as 0 or 5. These are indicated by sparse lines, one for each outcome, with a height corresponding to the probability of a given outcome. Some are continuous, their outcomes can take on any numerical value, such as -1.32 or 0.005. These are shown as dense curves with areas under the sections of the curve that give probabilities. The sum of the heights of lines and areas under curves is always 1.

Print, cut according dotted line and carry it with you in your wallet. This is your guide to the country of distributions and their relatives.

Bernoulli and uniform

You have already encountered the Bernoulli distribution above, with two outcomes - heads or tails. Imagine it now as a distribution over 0 and 1, 0 is heads, 1 is tails. As is already clear, both outcomes are equally likely, and this is reflected in the diagram. The Bernoulli PDF contains two lines of equal height, representing 2 equally probable outcomes: 0 and 1, respectively.

The Bernoulli distribution can also represent unequally probable outcomes, such as flipping an incorrect coin. Then the probability of heads will not be 0.5, but some other value p, and the probability of tails will be 1-p. Like many other distributions, this is actually a whole family of distributions given by certain parameters, as p above. When you think “Bernoulli”, think about “tossing a (possibly wrong) coin.”

From here it is a very small step to represent the distribution on top of several equally probable outcomes: a uniform distribution characterized by a flat PDF. Imagine a regular dice. Its outcomes 1-6 are equally probable. It can be specified for any number of outcomes n, and even as a continuous distribution.

Think of an even distribution as a “straight die.”

Binomial and hypergeometric

The binomial distribution can be thought of as the sum of the outcomes of those things that follow the Bernoulli distribution.

Toss a fair coin twice - how many times will it be heads? This is a number that follows the binomial distribution. Its parameters are n, the number of trials, and p – the probability of “success” (in our case, heads or 1). Each throw is a Bernoulli-distributed outcome, or test. Use binomial distribution, when you count the number of successes in things like tossing a coin, where each toss is independent of the others and has the same probability of success.

Or imagine an urn with the same number of white and black balls. Close your eyes, take out the ball, write down its color and put it back. Repeat. How many times is the black ball drawn? This number also follows the binomial distribution.

We presented this strange situation to make it easier to understand the meaning of the hypergeometric distribution. This is the distribution of the same number, but in the situation if we Not returned the balls. It certainly cousin binomial distribution, but not the same, since the probability of success changes with each ball drawn. If the number of balls is large enough compared to the number of draws, then these distributions are almost identical, since the chance of success changes extremely slightly with each draw.

When someone talks about pulling balls out of urns without returning them, it is almost always safe to say “yes, hypergeometric distribution,” because I have never met anyone in my life who actually filled urns with balls and then pulled them out and returned them, or vice versa. I don’t even know anyone with trash cans. Even more often, this distribution should emerge when selecting a significant subset of some population as a sample.

Note translation

It may not be very clear here, but since the tutorial is an express course for beginners, it should be clarified. The population is something that we want to statistically evaluate. To estimate, we select a certain part (subset) and make the required estimate on it (then this subset is called a sample), assuming that the estimate for the entire population will be similar. But for this to be true, additional restrictions are often required on the definition of a subset of the sample (or, conversely, from a known sample we need to evaluate whether it sufficiently accurately describes the population).

A practical example - we need to select representatives from a company of 100 people to travel to E3. It is known that 10 people already traveled there last year (but no one admits it). How much minimum do you need to take so that there is a high probability of at least one experienced comrade in the group? In this case population- 100, sample - 10, sample requirements - at least one who has already been to E3.

Wikipedia has a less funny, but more practical example about defective parts in a batch.

Poisson

What about the number of customers calling hotline to technical support every minute? This is an outcome whose distribution appears to be binomial, if we count each second as a Bernoulli test during which the customer either does not call (0) or calls (1). But power supply organizations know very well: when the electricity is turned off, two people can call in a second. or even more than a hundred of people. Thinking of it as 60,000 millisecond tests doesn't help either - there are more tests, the probability of a call per millisecond is less, even if you don't count two or more at the same time, but, technically, this is still not a Bernoulli test. However, it works logical reasoning with the transition to infinity. Let n tend to infinity and p to 0, so that np is constant. It's like dividing into smaller and smaller fractions of time with an increasingly smaller probability of a call. In the limit we get the Poisson distribution.

Just like the binomial, the Poisson distribution is a count distribution: the number of times something will happen. It is parameterized not by the probability p and the number of trials n, but by the average intensity λ, which, in analogy with the binomial, is simply a constant value np. Poisson distribution - what it's all about necessary remember when we are talking about counting events over a certain time at a constant given intensity.

When there is something, such as packets arriving at a router, or customers appearing in a store, or something waiting in line, think “Poisson”.

Geometric and negative binomial

From simple Bernoulli tests a different distribution emerges. How many times will a coin land heads before it lands on heads? The number of tails follows a geometric distribution. Like the Bernoulli distribution, it is parameterized by the probability of a successful outcome, p. It is not parameterized by the number n, the number of throw-tests, because the number of unsuccessful tests is precisely the outcome.

If the binomial distribution is “how many successes,” then the geometric distribution is “How many failures before success?”

The negative binomial distribution is a simple generalization of the previous one. This is the number of failures before there are r, not 1, successes. Therefore, it is further parameterized by this r. It is sometimes described as the number of successes to r failures. But as my life coach says: “You decide what is success and what is failure,” so it’s the same thing, as long as you remember that the probability p should also be the correct probability of success or failure, respectively.

If you need a joke to relieve tension, you can mention that the binomial and hypergeometric distributions are an obvious pair, but the geometric and negative binomial are also quite similar, and then say, “Well, who calls them all that, huh?”

Exponential and Weibula

Again about calls to technical support: how long will it take until the next call? The distribution of this waiting time seems to be geometric, because every second until no one calls is like a failure, until the second until the call finally happens. The number of failures is like the number of seconds until no one called, and this practically time until the next call, but “practically” is not enough for us. The point is that this time will be the sum of whole seconds, and thus it will not be possible to count the wait within this second before the call itself.

Well, as before, we move to the limit in the geometric distribution, regarding time shares - and voila. We obtain an exponential distribution that accurately describes the time before the call. This is a continuous distribution, the first of our kind, because the outcome is not necessarily in whole seconds. Like the Poisson distribution, it is parameterized by the intensity λ.

Reiterating the connection between the binomial and the geometric, Poisson's “how many events in time?” is related to the exponential “how long until the event?” If there are events whose number per unit time obeys the Poisson distribution, then the time between them obeys the exponential distribution with the same parameter λ. This correspondence between the two distributions must be noted when either of them is discussed.

The exponential distribution should come to mind when thinking about "time to event", perhaps "time to failure". In fact, this is such an important situation that more generalized distributions exist to describe MTBF, such as the Weibull distribution. While the exponential distribution is appropriate when the wear rate, or failure rate, for example, is constant, the Weibull distribution can model failure rates increasing (or decreasing) over time. Exponential is, in general, a special case.

Think "Weibull" when talking about MTBF.

Normal, lognormal, Student's t and chi-square

The normal, or Gaussian, distribution is probably one of the most important. Its bell-shaped shape is immediately recognizable. Like , this is a particularly curious entity that manifests itself everywhere, even from the most seemingly simple sources. Take a set of values ​​that follow the same distribution - any one! - and fold them. The distribution of their sum follows (approximately) a normal distribution. The more things are added up, the closer their sum corresponds to the normal distribution (the catch: the distribution of terms must be predictable, independent, it tends only to normal). That this is true despite the original distribution is amazing.

Note translation

I was surprised that the author does not write about the need for a comparable scale of summed distributions: if one significantly dominates the others, convergence will be extremely bad. And, in general, absolute mutual independence is not necessary; weak dependence is sufficient.

Well, probably good for parties, as he wrote.


This is called the “central limit theorem”, and you need to know what it is, why it is called that and what it means, otherwise you will instantly laugh.

In its context, normal is associated with all distributions. Although, basically, it is associated with the distribution of all sorts of amounts. The sum of Bernoulli trials follows a binomial distribution and, as the number of trials increases, this binomial distribution becomes closer to a normal distribution. Likewise, its cousin is the hypergeometric distribution. The Poisson distribution - the limiting form of the binomial - also approaches normal with increasing intensity parameter.

Outcomes that follow a lognormal distribution produce values ​​whose logarithm is normally distributed. Or in other words: the exponent of a normally distributed value is lognormally distributed. If the sums are normally distributed, then remember that the products are lognormally distributed.

The Student t distribution is the basis of the t test, which many non-statisticians study in other fields. It is used to make assumptions about the mean of a normal distribution and also tends to the normal distribution as its parameter increases. Distinctive feature t-distribution - its tails, which are thicker than those of the normal distribution.

If the fat-tailed joke didn't rock your neighbor enough, move on to a rather funny tale about beer. More than 100 years ago, Guinness used statistics to improve its stout. Then William Seely Gosset invented a completely new statistical theory for improved barley cultivation. Gossett convinced his boss that other brewers would not understand how to use his ideas, and received permission to publish, but under the pseudonym "Student". The most famous achievement Gosset - this is precisely the t-distribution, which, one might say, is named after him.

Finally, the chi-square distribution is the distribution of sums of squares of normally distributed values. The chi-square test is based on this distribution, which itself is based on the sum of squares of the differences, which should be normally distributed.

Gamma and beta

At this point, if you have already started talking about something chi-square, the conversation begins in earnest. You may already be talking to real statisticians, and you should probably bow out already, because things like the gamma distribution may come up. This is a generalization And exponential And chi-square distribution. Like the exponential distribution, it is used for complex models waiting times. For example, a gamma distribution appears when the time to the next n events is simulated. It appears in machine learning as the “adjoint prior distribution” to a couple of other distributions.

Don't talk about these conjugate distributions, but if you have to, don't forget to talk about the beta distribution, because it is the conjugate prior to most of the distributions mentioned here. Data scientists are sure that this is exactly what it was made for. Mention this casually and go to the door.

The beginning of wisdom

Probability distributions are something you can't know too much about. The truly interested can refer to this super-detailed map of all probability distributions Add tags Purpose of the service. The online calculator is used to construct a table of the distribution of the random variable X - the number of experiments performed and to calculate all the characteristics of the series: mathematical expectation, dispersion and standard deviation. The report with the decision is drawn up in Word format.
Example 1. In the urn white and black ball. Balls are drawn at random from the urn without returning until a white ball appears. As soon as this happens, the process stops.
This type of task relates to the problem of constructing a geometric distribution.

Example 2. Two Three shooters each fire one shot at the target. The probability of the first shooter hitting it is , second –

. Draw up a distribution law for the random variable X - the number of hits on the target. , Example 2a. The shooter fires two three four shots. The probability of hitting with a corresponding shot is equal to

. If the first miss occurs, the shooter does not participate in further competitions. Draw up a distribution law for the random variable X - the number of hits on the target. Example 3. In the party from details defective standard ones. The controller draws at random
details. Draw up a distribution law for the random variable X - the number of defective good parts in the sample. Similar task
: There are m red and n blue balls in the basket. K balls are drawn at random. Draw up the law of distribution of DSV X - the appearance of blue balls.

see other example solutions. Example 4. The probability of an event occurring in one trial is equal to . Produced
tests. Draw up a law of distribution of the random variable X - the number of occurrences of the event.:
Similar tasks for this type of distribution
1. Draw up a distribution law for the random variable X number of hits with four shots, if the probability of hitting the target with one shot is 0.8.

js-script
Example No. 1. Three coins are tossed. The probability of getting a coat of arms in one throw is 0.5. Draw up a distribution law for the random variable X - the number of dropped emblems.
Solution.
Probability that no emblems were drawn: P(0) = 0.5*0.5*0.5= 0.125 0,5 *0,5*0,5 + 0,5*0,5 *0,5 + 0,5*0,5*0,5 = 3*0,125=0,375
P(1) = 0,5 *0,5 *0,5 + 0,5 *0,5*0,5 + 0,5*0,5 *0,5 = 3*0,125=0,375
P(2) =

Probability of getting three coats of arms: P(3) = 0.5*0.5*0.5 = 0.125

X0 1 2 3
P0,125 0,375 0,375 0,125
Check: P = P(0) + P(1) + P(2) + P(3) = 0.125 + 0.375 + 0.375 + 0.125 = 1

Example No. 2. The probability of one shooter hitting the target with one shot for the first shooter is 0.8, for the second shooter – 0.85. The shooters fired one shot at the target. Considering hitting the target as independent events for individual shooters, find the probability of event A – exactly one hit on the target.
Example No. 1. Three coins are tossed. The probability of getting a coat of arms in one throw is 0.5. Draw up a distribution law for the random variable X - the number of dropped emblems.
Consider event A - one hit on the target. Possible options The occurrence of this event is as follows:

  1. The first shooter hit, the second shooter missed: P(A/H1)=p 1 *(1-p 2)=0.8*(1-0.85)=0.12
  2. The first shooter missed, the second shooter hit the target: P(A/H2)=(1-p 1)*p 2 =(1-0.8)*0.85=0.17
  3. The first and second arrows hit the target independently of each other: P(A/H1H2)=p 1 *p 2 =0.8*0.85=0.68
Then the probability of event A – exactly one hit on the target – will be equal to: P(A) = 0.12+0.17+0.68 = 0.97

Probability theory is a branch of mathematics that studies the patterns of random phenomena: random events, random variables, their properties and operations on them.

For a long time, probability theory did not have a clear definition. It was formulated only in 1929. The emergence of probability theory as a science dates back to the Middle Ages and the first attempts at mathematical analysis of gambling (flake, dice, roulette). 17th century French mathematicians Blaise Pascal and Pierre Fermat, exploring the prediction of winnings in gambling, discovered the first probabilistic patterns that arise when throwing dice.

Probability theory arose as a science from the belief that mass random events are based on certain patterns. Probability theory studies these patterns.

Probability theory deals with the study of events whose occurrence is not known with certainty. It allows you to judge the degree of probability of the occurrence of some events compared to others.

For example: it is impossible to determine unambiguously the result of “heads” or “tails” as a result of tossing a coin, but with repeated tossing, approximately the same number of “heads” and “tails” appear, which means that the probability that “heads” or “tails” will fall ", is equal to 50%.

Test in this case, the implementation of a certain set of conditions is called, that is, in this case, the toss of a coin. The challenge can be played an unlimited number of times. In this case, the set of conditions includes random factors.

The test result is event. The event happens:

  1. Reliable (always occurs as a result of testing).
  2. Impossible (never happens).
  3. Random (may or may not occur as a result of the test).

For example, when tossing a coin, an impossible event - the coin will land on its edge, a random event - the appearance of “heads” or “tails”. The specific test result is called elementary event. As a result of the test, only elementary events occur. The set of all possible, different, specific test outcomes is called space of elementary events.

Basic concepts of the theory

Probability- the degree of possibility of the occurrence of an event. When the reasons for some possible event to actually occur outweigh the opposite reasons, then this event is called probable, otherwise - unlikely or improbable.

Random value- this is a quantity that, as a result of testing, can take one or another value, and it is not known in advance which one. For example: number per fire station per day, number of hits with 10 shots, etc.

Random variables can be divided into two categories.

  1. Discrete random variable is a quantity that, as a result of testing, can take on certain values ​​with a certain probability, forming a countable set (a set whose elements can be numbered). This set can be either finite or infinite. For example, the number of shots before the first hit on the target is a discrete random variable, because this quantity can take on an infinite, albeit countable, number of values.
  2. Continuous random variable is a quantity that can take any value from some finite or infinite interval. Obviously, the number of possible values ​​of a continuous random variable is infinite.

Probability space- concept introduced by A.N. Kolmogorov in the 30s of the 20th century to formalize the concept of probability, which gave rise to the rapid development of probability theory as a strict mathematical discipline.

A probability space is a triple (sometimes enclosed in angle brackets: , where

This is an arbitrary set, the elements of which are called elementary events, outcomes or points;
- sigma algebra of subsets called (random) events;
- probability measure or probability, i.e. sigma-additive finite measure such that .

De Moivre-Laplace theorem- one of the limit theorems of probability theory, established by Laplace in 1812. It states that the number of successes when repeating the same random experiment over and over again with two possible outcomes is approximately normally distributed. It allows you to find an approximate probability value.

If for each of the independent trials the probability of the occurrence of some random event is equal to () and is the number of trials in which it actually occurs, then the probability of the inequality being true is close (for large values) to the value of the Laplace integral.

Distribution function in probability theory- a function characterizing the distribution of a random variable or random vector; the probability that a random variable X will take a value less than or equal to x, where x is an arbitrary real number. If known conditions are met, it completely determines the random variable.

Expected value- the average value of a random variable (this is the probability distribution of a random variable, considered in probability theory). In English-language literature it is denoted by , in Russian - . In statistics, the notation is often used.

Let a probability space and a random variable defined on it be given. That is, by definition, a measurable function. Then, if there is a Lebesgue integral of over space, then it is called the mathematical expectation, or the mean value, and is denoted .

Variance of a random variable- a measure of the spread of a given random variable, i.e. its deviation from the mathematical expectation. It is designated in Russian and foreign literature. In statistics, the notation or is often used. Square root from the variance is called the standard deviation, standard deviation or standard spread.

Let be a random variable defined on some probability space. Then

where the symbol denotes the mathematical expectation.

In probability theory, two random events are called independent, if the occurrence of one of them does not change the probability of the occurrence of the other. Similarly, two random variables are called dependent, if the value of one of them affects the probability of the values ​​of the other.

The simplest form of the law of large numbers is Bernoulli's theorem, which states that if the probability of an event is the same in all trials, then as the number of trials increases, the frequency of the event tends to the probability of the event and ceases to be random.

The law of large numbers in probability theory states that the arithmetic mean of a finite sample from a fixed distribution is close to the theoretical mean of that distribution. Depending on the type of convergence, a distinction is made between the weak law of large numbers, when convergence occurs by probability, and the strong law of large numbers, when convergence is almost certain.

The general meaning of the law of large numbers is joint action large number identical and independent random factors leads to a result that, in the limit, does not depend on chance.

Methods for estimating probability based on finite sample analysis are based on this property. A clear example is a forecast of election results based on a survey of a sample of voters.

Central limit theorems- a class of theorems in probability theory stating that the sum is sufficient large quantity weakly dependent random variables that have approximately the same scales (no one term dominates or makes a decisive contribution to the sum) has a distribution close to normal.

Since many random variables in applications are formed under the influence of several weakly dependent random factors, their distribution is considered normal. In this case, the condition must be met that none of the factors is dominant. Central limit theorems in these cases justify the use of the normal distribution.