Variation series: definition, types, main characteristics. Calculation method
mode, median, arithmetic mean in medical and statistical research
(show with a conditional example).
A variation series is a series of numerical values of the characteristic being studied, differing from each other in magnitude and arranged in a certain sequence (in ascending or descending order). Each numerical value of a series is called a variant (V), and the numbers showing how often a particular variant occurs in a given series are called frequency (p).
The total number of observation cases that make up the variation series is denoted by the letter n. The difference in the meaning of the characteristics being studied is called variation. If a varying characteristic does not have a quantitative measure, the variation is called qualitative, and the distribution series is called attributive (for example, distribution by disease outcome, health status, etc.).
If a varying characteristic has a quantitative expression, such variation is called quantitative, and the distribution series is called variational.
Variation series are divided into discontinuous and continuous - based on the nature of the quantitative characteristic; simple and weighted - based on the frequency of occurrence of the variant.
In a simple variation series, each option occurs only once (p=1), in a weighted series, the same option occurs several times (p>1). Examples of such series will be discussed further in the text. If the quantitative characteristic is continuous, i.e. Between integer quantities there are intermediate fractional quantities; the variation series is called continuous.
For example: 10.0 – 11.9
14.0 – 15.9, etc.
If the quantitative characteristic is discontinuous, i.e. its individual values (variants) differ from each other by an integer and do not have intermediate fractional values; the variation series is called discontinuous or discrete.
Using the heart rate data from the previous example
for 21 students, we will construct a variation series (Table 1).
Table 1
Distribution of medical students by heart rate (bpm)
Thus, to construct a variation series means to systematize and organize the available numerical values (variants), i.e. arrange in a certain sequence (in ascending or descending order) with their corresponding frequencies. In the example under consideration, the options are arranged in ascending order and expressed as integer discontinuous (discrete) numbers, each option occurs several times, i.e. we are dealing with a weighted, discontinuous or discrete variation series.
As a rule, if the number of observations in the statistical population we are studying does not exceed 30, then it is enough to arrange all the values of the characteristic being studied in an ascending variation series, as in Table. 1, or descending order.
At large quantities observations (n>30), the number of occurring variants can be very large, in this case an interval or grouped variation series is compiled, in which, to simplify subsequent processing and clarify the nature of the distribution, the variants are combined into groups.
Typically the number of group options ranges from 8 to 15.
There should be at least 5 of them, because... otherwise it will be too crude, excessive enlargement, which distorts big picture variations and greatly affects the accuracy of average values. When the number of group variants is more than 20-25, the accuracy of calculating average values increases, but the characteristics of the variation of the characteristic are significantly distorted and mathematical processing becomes more complicated.
When compiling a grouped series, it is necessary to take into account
− option groups must be arranged in a certain order (ascending or descending);
− intervals in option groups must be the same;
− the values of the interval boundaries should not coincide, because it will be unclear which groups to classify individual variants into;
− it is necessary to take into account the qualitative features of the collected material when setting interval limits (for example, when studying the weight of adults, an interval of 3-4 kg is acceptable, and for children in the first months of life it should not exceed 100 g)
Let's construct a grouped (interval) series characterizing data on the pulse rate (beats per minute) for 55 medical students before the exam: 64, 66, 60, 62,
64, 68, 70, 66, 70, 68, 62, 68, 70, 72, 60, 70, 74, 62, 70, 72, 72,
64, 70, 72, 76, 76, 68, 70, 58, 76, 74, 76, 76, 82, 76, 72, 76, 74,
79, 78, 74, 78, 74, 78, 74, 74, 78, 76, 78, 76, 80, 80, 80, 78, 78.
To build a grouped series you need:
1. Determine the size of the interval;
2. Determine the middle, beginning and end of the groups of the variation series.
● The size of the interval (i) is determined by the number of supposed groups (r), the number of which is set depending on the number of observations (n) according to a special table
Number of groups depending on the number of observations:
In our case, for 55 students, you can create from 8 to 10 groups.
The value of the interval (i) is determined by the following formula -
i = V max-V min/r
In our example, the value of the interval is 82-58/8= 3.
If the interval value is a fraction, the result should be rounded to the nearest whole number.
There are several types of averages:
● harmonic mean,
● root mean square,
● average progressive,
● median
IN medical statistics Arithmetic averages are most often used.
The arithmetic mean (M) is a generalizing value that determines what is typical for the entire population. The main methods for calculating M are: the arithmetic mean method and the method of moments (conditional deviations).
The arithmetic mean method is used to calculate the simple arithmetic mean and the weighted arithmetic mean. The choice of method for calculating the arithmetic mean depends on the type of variation series. In the case of a simple variation series, in which each option occurs only once, the arithmetic mean simple is determined by the formula:
where: M – arithmetic mean value;
V – value of the varying characteristic (variants);
Σ – indicates the action – summation;
n – total number observations.
An example of calculating the simple arithmetic average. Respiratory rate (number of breathing movements per minute) in 9 men aged 35 years: 20, 22, 19, 15, 16, 21, 17, 23, 18.
To determine the average level of respiratory rate in men aged 35 years, it is necessary:
1. Construct a variation series, arranging all options in ascending or descending order. We have obtained a simple variation series, because option values occur only once.
M = ∑V/n = 171/9 = 19 breaths per minute
Conclusion. The respiratory rate in men aged 35 years is on average 19 respiratory movements per minute.
If individual values of a variant are repeated, there is no need to write down each variant in a line; it is enough to list the occurring sizes of the variant (V) and next to it indicate the number of their repetitions (p). Such a variation series, in which the options are, as it were, weighed by the number of frequencies corresponding to them, is called a weighted variation series, and the calculated average value is the weighted arithmetic mean.
The weighted arithmetic mean is determined by the formula: M= ∑Vp/n
where n is the number of observations equal to the sum of frequencies – Σр.
An example of calculating the arithmetic weighted average.
Duration of disability (in days) in 35 patients with acute respiratory diseases (ARI), treated by a local doctor during the first quarter current year amounted to: 6, 7, 5, 3, 9, 8, 7, 5, 6, 4, 9, 8, 7, 6, 6, 9, 6, 5, 10, 8, 7, 11, 13, 5, 6, 7, 12, 4, 3, 5, 2, 5, 6, 6, 7 days.
The method for determining the average duration of disability in patients with acute respiratory infections is as follows:
1. Let's construct a weighted variation series, because Individual values of the option are repeated several times. To do this, you can arrange all options in ascending or descending order with their corresponding frequencies.
In our case, the options are arranged in ascending order
2. Calculate the arithmetic weighted average using the formula: M = ∑Vp/n = 233/35 = 6.7 days
Distribution of patients with acute respiratory infections by duration of disability:
Duration of disability (V) | Number of patients (p) | Vp |
∑p = n = 35 | ∑Vp = 233 |
Conclusion. The duration of disability in patients with acute respiratory diseases averaged 6.7 days.
Mode (Mo) is the most common option in the variation series. For the distribution presented in the table, the mode corresponds to an option equal to 10; it occurs more often than others - 6 times.
Distribution of patients by length of stay hospital bed(in days)
V |
p |
Sometimes it is difficult to determine the exact magnitude of a mode because there may be several “most common” observations in the data being studied.
Median (Me) is a nonparametric indicator that divides the variation series into two equal halves: the same number of variants is located on both sides of the median.
For example, for the distribution shown in the table, the median is 10, because on both sides of this value there is option 14, i.e. the number 10 occupies a central position in this series and is its median.
Given that the number of observations in this example is even (n=34), the median can be determined as follows:
Me = 2+3+4+5+6+5+4+3+2/2 = 34/2 = 17
This means that the middle of the series falls on the seventeenth option, which corresponds to a median equal to 10. For the distribution presented in the table, the arithmetic mean is equal to:
M = ∑Vp/n = 334/34 = 10.1
So, for 34 observations from table. 8, we got: Mo=10, Me=10, arithmetic mean (M) is 10.1. In our example, all three indicators turned out to be equal or close to each other, although they are completely different.
The arithmetic mean is the effective sum of all influences; all options without exception, including extreme ones, often atypical for a given phenomenon or population, take part in its formation.
The mode and median, unlike the arithmetic mean, do not depend on the value of all individual values of the varying characteristic (the values of the extreme variants and the degree of dispersion of the series). The arithmetic mean characterizes the entire mass of observations, the mode and median characterize the bulk
Practical lesson 1
VARIATION SERIES OF DISTRIBUTION
Variation series or near distribution call the ordered distribution of units of a population according to increasing (more often) or decreasing (less often) values of a characteristic and counting the number of units with a particular value of the characteristic.
There are 3 kind distribution row:
1) ranked series– this is a list of individual units of the population in ascending order of the characteristic being studied; if the number of population units is large enough, the ranked series becomes cumbersome, and in such cases the distribution series is constructed by grouping population units according to the values of the characteristic being studied (if the characteristic takes on a small number of values, then a discrete series is constructed, and otherwise, an interval series);
2) discrete series- this is a table consisting of two columns (rows) - specific values of a varying characteristic X i and the number of population units with a given characteristic value f i– frequencies; the number of groups in a discrete series is determined by the number of actually existing values of the varying characteristic;
3) interval series- this is a table consisting of two columns (rows) - intervals of a varying characteristic X i and the number of population units falling into a given interval (frequencies), or the proportion of this number in the total number of populations (frequencies).
Numbers showing how many times individual options occur in a given population are called frequencies or scales option and are designated by a lowercase letter of the Latin alphabet f. The total sum of the frequencies of the variation series is equal to the volume of the given population, i.e.
Where k– number of groups, n– the total number of observations, or the size of the population.
Frequencies (weights) are expressed not only in absolute, but also in relative numbers - in fractions of a unit or as a percentage of the total number of variants that make up a given population. In such cases the weights are called relative frequencies or frequencies. The total sum of the parts is equal to one
or
,
if frequencies are expressed as a percentage of the total number of observations P. Replacing frequencies with frequencies is not necessary, but sometimes it turns out to be useful and even necessary in cases where it is necessary to compare variation series with each other that differ greatly in their volumes.
Depending on how the attribute varies - discretely or continuously, in a wide or narrow range - the statistical population is distributed in non-interval or interval variation series. In the first case, the frequencies relate directly to the ranked values of the attribute, which acquire the position separate groups or classes of the variation series, in the second - they count the frequencies related to individual intervals or intervals (from - to), into which the total variation of the characteristic is divided, ranging from the minimum to the maximum variants of a given population. These gaps, or class intervals, may or may not be equal in width. Hence they distinguish equal and unequal interval variation series. In unequally interval series, the nature of the frequency distribution changes as the width of the class intervals changes. Unequal interval grouping is used relatively rarely in biology. As a rule, biometric data is distributed into equal-interval series, which allows not only to identify patterns of variation, but also facilitates the calculation of summary numerical characteristics of the variation series and comparison of distribution series with each other.
When starting to construct an equal-interval variation series, it is important to correctly outline the width of the class interval. The fact is that rough grouping (when very wide class intervals are established) distorts the typical features of variation and leads to a decrease in the accuracy of the numerical characteristics of the series. When choosing excessively narrow intervals, the accuracy of generalizing numerical characteristics increases, but the series turns out to be too stretched and does not give a clear picture of variation.
To obtain a clearly visible variation series and To ensure sufficient accuracy of the numerical characteristics calculated from it, the variation of the characteristic (ranging from minimum to maximum options) should be divided into such a number of groups or classes that would satisfy both requirements. This problem is solved by dividing the range of variation of the characteristic by the number of groups or classes outlined when constructing the variation series:
,
Where h– interval size; X m a x and X min – maximum and minimum values in total; k– number of groups.
When constructing an interval distribution series, it is necessary to select the optimal number of groups (attribute intervals) and set the length (range) of the interval. Since when analyzing a distribution series, frequencies in different intervals, it is necessary that the length of the intervals be constant. If you have to deal with an interval series of distributions with unequal intervals, then for comparability you need to reduce the frequency or frequency to the unit of the interval, the resulting value is called density
ρ
, that is
.
The optimal number of groups is chosen so that the diversity of attribute values in the aggregate is sufficiently reflected and, at the same time, the distribution pattern is not distorted by random frequency fluctuations. If there are too few groups, the pattern of variation will not appear; if there are too many groups, random frequency jumps will distort the shape of the distribution.
Most often, the number of groups in a distribution series is determined using the Sturgess formula:
Where n– population size.
A graphical representation provides significant assistance in analyzing the distribution series and its properties. An interval series is depicted by a bar chart, in which the bases of the bars located along the abscissa axis are the intervals of values of the varying characteristic, and the heights of the bars are the frequencies corresponding to the scale along the ordinate axis. This type of diagram is called histogram.
If there is a discrete distribution series or the midpoints of intervals are used, then the graphical representation of such a series is called polygon, which is obtained by connecting straight lines to points with coordinates X i And f i .
If the values of classes are plotted along the abscissa axis, and accumulated frequencies are plotted along the ordinate axis, followed by connecting the points with straight lines, a graph called cumulate. The accumulated frequencies are found by sequential summation, or cumulation frequencies in the direction from the first class to the end of the variation series.
Example. There is data on the egg production of 50 laying hens in 1 year kept on a poultry farm (Table 1.1).
Table 1.1
Egg production of laying hens
Laying hen no. |
Egg production, pcs. |
Laying hen no. |
Egg production, pcs. |
Laying hen no. |
Egg production, pcs. |
Laying hen no. |
Egg production, pcs. |
Laying hen no. |
Egg production, pcs. |
It is required to construct an interval distribution series and display it graphically in the form of a histogram, polygon and cumulate.
It can be seen that the trait varies from 212 to 245 eggs obtained from a laying hen in 1 year.
In our example, using the Sturgess formula, we determine the number of groups:
k = 1 + 3,322lg 50 = 6,643 ≈ 7.
Let's calculate the length (span) of the interval using the formula:
.
Let's build an interval series with 7 groups and an interval of 5 pieces. eggs (Table 1.2). To construct graphs in the table, we calculate the middle of the intervals and the accumulated frequency.
Table 1.2
Interval series of egg production distribution
Group of laying hens by egg production X i |
Number of laying hens f i |
Middle of the interval X i' |
Cumulative frequency f i ’ |
|
Let's build a histogram of egg production distribution (Fig. 1.1).
Rice. 1.1. Histogram of egg production distribution
These histograms show a distribution shape characteristic of many characteristics: the values of the average intervals of the characteristic are more common, and the extreme (small and large) values of the characteristic are less common. The shape of this distribution is close to the normal distribution law, which is formed if a varying variable is influenced by a large number of factors, none of which has a predominant significance.
The polygon and cumulate distribution of egg production look like (Fig. 1.2 and 1.3).
Rice. 1.2. Egg production distribution area
Rice. 1.3. Cumulates of egg production distribution
Technology for solving problems in table processor Microsoft Excel next.
1. Enter the initial data in accordance with Fig. 1.4.
2. Rank the series.
2.1. Select cells A2:A51.
2.2. Left-click on the toolbar on the button<Сортировка по возрастанию > .
3. Determine the size of the interval for constructing the interval distribution series.
3.1. Copy cell A2 to cell E53.
3.2. Copy cell A51 to cell E54.
3.3. Calculate the range of variation. To do this, enter the formula in cell E55 =E54-E53.
3.4. Calculate the number of variation groups. To do this, enter the formula in cell E56 =1+3.322*LOG10(50).
3.5. Enter the rounded number of groups in cell E57.
3.6. Calculate the length of the interval. To do this, enter the formula in cell E58 =E55/E57.
3.7. Enter the rounded interval length in cell E59.
4. Construct an interval series.
4.1. Copy cell E53 to cell B64.
4.2. Enter the formula in cell B65 =B64+$E$59.
4.3. Copy cell B65 to cells B66:B70.
4.4. Enter the formula in cell C64 =B65.
4.5. Enter the formula in cell C65 =C64+$E$59.
4.6. Copy cell C65 to cells C66:C70.
The solution results are displayed on the display screen in the following form (Fig. 1.5).
5. Calculate the interval frequency.
5.1. Run the command Service,Data analysis, clicking alternately with the left mouse button.
5.2. In the dialog box Data analysis use the left mouse button to install: Analysis Tools <Гистограмма>(Fig. 1.6).
5.3. Left click on the button<ОК>.
5.4. On the tab bar chart set the parameters according to fig. 1.7.
5.5. Left click on the button<ОК>.
The solution results are displayed on the display screen in the following form (Fig. 1.8).
6. Fill out the table “Interval distribution series”.
6.1. Copy cells B74:B80 to cells D64:D70.
6.2. Calculate the sum of the frequencies. To do this, select cells D64:D70 and left-click on the button in the toolbar<Автосумма > .
6.3. Calculate the midpoint of the intervals. To do this, enter the formula in cell E64 =(B64+C64)/2 and copy to cells E65:E70.
6.4. Calculate accumulated frequencies. To do this, copy cell D64 to cell F64. In cell F65, enter the formula =F64+D65 and copy it to cells F66:F70.
The solution results are displayed on the display screen in the following form (Fig. 1.9).
7. Edit the histogram.
7.1. Right-click on the diagram on the name “pocket” and on the tab that appears, click the button<Очистить>.
7.2. Right-click on the chart and in the tab that appears, click<Исходные данные>.
7.3. In the dialog box Initial data change the X-axis labels. To do this, select cells B64:C70 (Fig. 1.10).
7.5. Press the key
The results are displayed on the display screen in the following form (Fig. 1.11).
8. Construct a polygon for distribution of egg production.
8.1. Left-click on the toolbar on the button<Мастер диаграмм > .
8.2. In the dialog box Chart Wizard (step 1 of 4) using the left mouse button set: Standard <График>(Fig. 1.12).
8.3. Left click on the button<Далее>.
8.4. In the dialog box Chart Wizard (step 2 of 4) set the parameters according to fig. 1.13.
8.5. Left click on the button<Далее>.
8.6. In the dialog box Chart Wizard (step 3 of 4) enter the names of the diagram and Y-axis (Fig. 1.14).
8.7. Left click on the button<Далее>.
8.8. In the dialog box Chart Wizard (step 4 of 4) set the parameters according to fig. 1.15.
8.9. Left click on the button<Готово>.
The results are displayed on the display screen in the following form (Fig. 1.16).
9. Insert data labels into the graph.
9.1. Right-click on the chart and in the tab that appears, click<Исходные данные>.
9.2. In the dialog box Initial data change the X-axis labels. To do this, select cells E64:E70 (Fig. 1.17).
9.3. Press the key
The results are displayed on the display screen in the following form (Fig. 1.18).
The distribution cumulate is constructed similarly to the distribution polygon based on accumulated frequencies.
Variation series - a series in which are compared (by degree of increase or decrease) options and corresponding frequencies
Options are individual quantitative expressions of a characteristic. Designated Latin letter V . The classical understanding of the term “variant” assumes that each unique value of a characteristic is called a variant, without taking into account the number of repetitions.
For example, in the variation series of systolic indicators blood pressure measured in ten patients:
110, 120, 120, 130, 130, 130, 140, 140, 160, 170;
There are only 6 values available:
110, 120, 130, 140, 160, 170.
Frequency is a number indicating how many times an option is repeated. Denoted by a Latin letter P . The sum of all frequencies (which, of course, is equal to the number of all those studied) is denoted as n.
Types of variation series:
The variation series is used to describe large arrays of numbers; it is in this form that the collected data of most medical research. In order to characterize the variation series, we calculate special indicators, including average values, indicators of variability (so-called dispersion), indicators of representativeness of sample data.
1) The arithmetic mean is a general indicator characterizing the size of the characteristic being studied. The arithmetic mean is denoted as M , is the most common type of average. The arithmetic mean is calculated as the ratio of the sum of the indicator values of all observation units to the number of all subjects studied. The method for calculating the arithmetic mean differs for a simple and weighted variation series.
Formula for calculation simple arithmetic average:
Formula for calculation weighted arithmetic average:
M = Σ(V * P)/ n
2) Mode is another average value of the variation series, corresponding to the most frequently repeated option. Or, to put it another way, this is the option that corresponds to the highest frequency. Denoted as Mo . The mode is calculated only for weighted series, since in simple series none of the options is repeated and all frequencies are equal to one.
For example, in the variation series of heart rate values:
80, 84, 84, 86, 86, 86, 90, 94;
the mode value is 86 because this option occurs 3 times, therefore its frequency is the highest.
3) Median - the value of the option dividing the variation series in half: on both sides of it there is an equal number of options. The median, like the arithmetic mean and mode, refers to average values. Denoted as Me
4) Standard deviation (synonyms: standard deviation, sigma deviation, sigma) - a measure of the variability of the variation series. It is an integral indicator that combines all cases of deviation from the average. In fact, it answers the question: how far and how often do variants spread from the arithmetic mean. Denoted by a Greek letter σ ("sigma").
If the population size is more than 30 units, the standard deviation is calculated using the following formula:
For small populations - 30 observation units or less - the standard deviation is calculated using a different formula:
The absolute level of the series-value (levels) that make up the dynamic series (reflect
phenomena at a certain moment or time interval))
Absolute increase represents the difference between the next and previous levels.
Growth rate is the ratio of the next level to the previous one, multiplied by 100%.
Rate of increase is the ratio of the absolute increase (decrease) to the previous level, multiplied by 100%.
Value of 1% increase is determined by the ratio of absolute growth to the growth rate.
Visualization indicator (shows the ratio of each level of the series to one of them, usually the initial one, taken as 100%).
Variation series- a number of homogeneous statistical quantities characterizing the same quantitative accounting characteristic, differing from each other in their magnitude and arranged in a certain order (decreasing or increasing).
Elements of the variation series:
A) option -v- the numerical value of the changing quantitative characteristic being studied.
b) frequency -porf- repeatability of an option in a variation series, showing how often one or another option occurs in a given series.
V) total number of observations -n- sum of all frequencies: n=ΣΡ. If the total number of observations is more than 30, the statistical sample is considered big, if n is less than or equal to 30 - small.
Variation series are:
depending on the frequency of occurrence of the trait:
A) simple- series - each option occurs once, i.e. frequencies are equal to unity.
b) ordinary- a series in which options appear more than once.
V) grouped- a series in which options are combined into groups according to their size within a certain interval, indicating the frequency of repetition of all options included in the group.
A grouped variation series is used when there is a large number of observations and a large range of extreme values.
Processing the variation series consists of obtaining the parameters of the variation series (average value, standard deviation and average error of the average value).
3. depending on the number of observations:
a) even and odd
b) large (if the number of observations is more than 30) and small (if the number of observations is less than or equal to 30)
Average values give a generalizing characteristic of a statistical population according to a certain changing quantitative characteristic. average value characterizes the entire series of observations with one number, expressing the general measure of the characteristic being studied. It levels out random deviations of individual observations and gives a typical characteristic of a quantitative characteristic.
Requirements for average values:
1) qualitative homogeneity of the population for which the average value is calculated - only then will it objectively reflect the characteristic features of the phenomenon being studied.
2) the average value should be based on a mass generalization of the characteristic being studied, because only then does it express the typical dimensions of the trait
Average values are obtained from distribution series (variation series).
Types of averages:
A ) fashion(Mo) is the value of a characteristic that occurs more often than others in the aggregate. The mode is taken to be the variant that corresponds to the largest number of frequencies in the variation series.
b ) Median(Me) is the value of a characteristic that occupies the middle value in the variation series. It divides the variation series into two equal parts.
The magnitude of the mode and median are not affected by the numerical values of the extreme variants available in the variation series. They cannot always accurately characterize the variation series and are used relatively rarely in medical statistics. The arithmetic mean characterizes the variation series more accurately.
V ) Arithmetic mean(M, or) - calculated based on all numerical values of the characteristic being studied.
Other average values are used less frequently: geometric average (when processing the results of titration of antibodies, toxins, vaccines); root mean square (when determining the average diameter of a cell cut, the results of skin immunological tests); average cubic (to determine the average volume of tumors) and others.
In a simple variation series, where options occur only once, the simple arithmetic mean is calculated using the formula: where V are the numerical values of the option, n is the number of observations,
In a regular variation series, the weighted arithmetic mean is calculated using the formula:
Where V are the numerical values of the variant, p is the frequency of occurrence of the variant, n is the number of observations.
Equally sized averages can be obtained from series with varying degrees scattering, therefore, to characterize the variation series, in addition to the average value, another characteristic is needed , allowing one to assess the degree of its variability.
Simple indicators characterizing the diversity of a trait in the population under study are
A) limit- minimum and maximum value of a quantitative characteristic
b) amplitude- the difference between the largest and smallest value of the option.
Application of average values:
a) to characterize physical development (height, weight, chest circumference, dynamometry)
b) to assess the state of human health by analyzing the physiological, biochemical parameters of the body (blood pressure, heart rate, body temperature)
c) to analyze the activities of medical organizations (average number of days a bed is open per year, etc.)
d) to evaluate the work of doctors (average number of visits per doctor, average number of surgical operations, average hourly workload of a doctor at a clinic appointment)
Variation series - this is a statistical series showing the distribution of the phenomenon under study according to the value of any quantitative characteristic. For example, patients by age, duration of treatment, newborns by weight, etc.
Option - individual values of the characteristic by which the grouping is carried out (denoted V ) .
Frequency- a number showing how often a particular option occurs (denoted P ) . The sum of all frequencies shows total number observations and is designated n . The difference between the largest and smallest variant of a variation series is called span or amplitude .
There are variation series:
A series is considered continuous if the grouping characteristic can be expressed in fractional values (weight, height, etc.), discontinuous if the grouping characteristic is expressed only as an integer (days of disability, number of pulse beats, etc.).
2.Simple and balanced.
A simple variation series is a series in which the quantitative value of a varying characteristic occurs once. In a weighted variation series, the quantitative values of a varying characteristic are repeated with a certain frequency.
3. Grouped (interval) and ungrouped.
A grouped series has options combined into groups that unite them by size within a certain interval. In an ungrouped series, each individual option corresponds to a certain frequency.
4. Even and odd.
In even variation series, the sum of frequencies or the total number of observations is expressed by an even number, in odd ones - by an odd number.
5. Symmetrical and asymmetrical.
In a symmetrical variation series, all types of average values coincide or are very close (mode, median, arithmetic mean).
Depending on the nature of the phenomena being studied, on the specific tasks and goals of statistical research, as well as on the content of the source material, in sanitary statistics The following types of averages are used:
structural means (mode, median);
arithmetic mean;
harmonic mean;
geometric mean;
average progressive.
Fashion (M O ) - the value of a varying characteristic, which is more often found in the population being studied, i.e. option corresponding to the highest frequency. They find it directly from the structure of the variation series, without resorting to any calculations. It is usually a value very close to the arithmetic mean and is very convenient in practical activities.
Median (M e ) - dividing the variation series (ranked, i.e. the values of the option are arranged in ascending or descending order) into two equal halves. The median is calculated using the so-called odd series, which is obtained by sequential summation of frequencies. If the sum of the frequencies corresponds to an even number, then the arithmetic mean of the two average values is conventionally taken as the median.
Mode and median are used in the case of an open population, i.e. when the largest or smallest options do not have an exact quantitative characteristic (for example, up to 15 years, 50 and older, etc.). In this case, the arithmetic mean (parametric characteristics) cannot be calculated.
Average I'm arithmetic - the most common value. The arithmetic mean is often denoted by M.
There are simple and weighted arithmetic averages.
Simple arithmetic mean calculated:
- in cases where the population is represented by a simple list of knowledge of a characteristic for each unit;
- if the number of repetitions of each option cannot be determined;
- if the number of repetitions of each option is close to each other.
The simple arithmetic mean is calculated using the formula:
where V - individual values of the characteristic; n - number of individual values; - summation sign.
Thus, the simple average is the ratio of the sum of the variants to the number of observations.
Example: determine the average length of stay in a bed for 10 patients with pneumonia:
16 days - 1 patient; 17–1; 18–1; 19–1; 20–1; 21–1; 22–1; 23–1; 26–1; 31–1.
bed-day
Arithmetic average weighted is calculated in cases where individual values of a characteristic are repeated. It can be calculated in two ways:
1. Directly (arithmetic mean or direct method) according to the formula:
,
where P is the frequency (number of cases) of observations of each option.
Thus, the weighted arithmetic mean is the ratio of the sum of the products of variant and frequency to the number of observations.
2. By calculating deviations from the conditional average (using the method of moments).
The basis for calculating the weighted arithmetic average is:
― grouped material according to variants of a quantitative characteristic;
— all options should be arranged in ascending or descending order of the value of the attribute (ranked series).
To calculate using the moment method, a prerequisite is the same size of all intervals.
Using the method of moments, the arithmetic mean is calculated using the formula:
,
where M o is the conditional average, which is often taken to be the value of the characteristic corresponding to the highest frequency, i.e. which is repeated more often (Fashion).
i is the value of the interval.
a is a conditional deviation from the conditions of the average, which is a sequential series of numbers (1, 2, etc.) with a + sign for variants of large conditional averages and with a – sign (–1, –2, etc.) for variants, which are below the conventional average. The conditional deviation from the variant taken as the conditional average is 0.
P - frequencies.
- total number of observations or n.
Example: define average height 8-year-old boys using the direct method (Table 1).
Table 1
Height in cm |
boys P |
Central option V | |
The central option - the middle of the interval - is defined as the semi-sum of the initial values of two neighboring groups:
;
etc.
The product VP is obtained by multiplying the central variants by the frequencies ;
etc. Then the resulting products are added and obtained
, which is divided by the number of observations (100) and a weighted arithmetic mean is obtained.
cm.
We will solve the same problem using the method of moments, for which the following table 2 is compiled:
Table 2
Height in cm (V) |
boys P | ||
n=100
We take 122 as M o, because out of 100 observations, 33 people had a height of 122 cm. We find conditional deviations (a) from the conditional average in accordance with the above. Then we obtain the product of conditional deviations by frequencies (aP) and sum up the obtained values ( ). The result is 17. Finally, we substitute the data into the formula:
When studying a varying characteristic, one cannot limit oneself only to calculating average values. It is also necessary to calculate indicators characterizing the degree of diversity of the characteristics being studied. The value of one or another quantitative characteristic is not the same for all units of the statistical population.
The characteristic of a variation series is the standard deviation ( ), which shows the spread (dispersion) of the studied characteristics relative to the arithmetic mean, i.e. characterizes the variability of the variation series. It can be determined directly using the formula:
The standard deviation is equal to the square root of the sum of the products of the squared deviations of each option from the arithmetic mean (V–M) 2 by its frequencies divided by the sum of the frequencies ( ).
Calculation example: determine the average number of sick leave certificates issued in the clinic per day (Table 3).
Table 3
Number of sick days sheets issued doctor per day (V) |
Number of doctors (P) | ||||
;
In the denominator, when the number of observations is less than 30, it is necessary from subtract one.
If the series is grouped at equal intervals, then the standard deviation can be determined using the method of moments:
,
where i is the value of the interval;
- conditional deviation from the conditional average;
P - frequency variant of the corresponding intervals;
- total number of observations.
Example calculation : Determine the average length of stay of patients on a therapeutic bed (using the method of moments) (Table 4):
Table 4
Number of days stay in bed (V) |
sick (P) |
|
|
|
;
The Belgian statistician A. Quetelet discovered that variations in mass phenomena obey the law of error distribution, discovered almost simultaneously by K. Gauss and P. Laplace. The curve representing this distribution has the shape of a bell. According to the normal distribution law, the variability of individual values of a characteristic is within the limits , which covers 99.73% of all units in the population.
It has been calculated that if you add and subtract 2 to the arithmetic mean , then 95.45% of all members of the variation series are within the obtained values and, finally, if we add and subtract 1 to the arithmetic mean
, then 68.27% of all members of this variation series will be within the obtained values. In medicine with magnitude
1
associated with the concept of norm. Deviation from the arithmetic mean is more than 1
, but less than 2
is subnormal, and the deviation is more than 2
abnormal (above or below normal).
In health statistics, the three-sigma rule is used when studying physical development, assessing the performance of healthcare institutions, and assessing the health of the population. The same rule is widely used in the national economy when determining standards.
Thus, the standard deviation serves for:
— measurements of the dispersion of the variation series;
— characteristics of the degree of diversity of characteristics, which are determined by the coefficient of variation:
If the coefficient of variation is more than 20% - strong diversity, from 20 to 10% - average, less than 10% - weak diversity of traits. The coefficient of variation is to a certain extent a criterion for the reliability of the arithmetic mean.