Statistical significance: definition, concept, significance, regression equations and hypothesis testing. Statistical significance

22.09.2019

Statistical significance

The results obtained using a particular research procedure are called statistically significant, if the probability of their random occurrence is very small. This concept can be illustrated with the example of tossing a coin. Suppose the coin is tossed 30 times; Heads came up 17 times and tails came up 13 times. Does it significant deviation of this result from the expected one (15 heads and 15 tails), or is this deviation random? To answer this question, you can, for example, toss the same coin many times, 30 times in a row, and at the same time note how many times the ratio of “heads” to “tails” of 17:13 is repeated. Statistical analysis saves us from this tedious process. With its help, after the first 30 tosses of a coin, you can estimate the possible number of random occurrences of 17 “heads” and 13 “tails”. Such an assessment is called a probabilistic statement.

IN scientific literature in industrial-organizational psychology, a probabilistic statement in mathematical form is denoted by the expression R(probability)< (менее) 0,05 (5 %), которое следует читать как «вероятность менее 5 %». В примере с киданием монеты это утверждение будет означать, что если исследователь проведет 100 опытов, каждый раз кидая монету по 30 раз, то он может ожидать случайного выпадения комбинации из 17 «орлов» и 13 «решек» менее, чем в 5 опытах. Этот результат будет сочтен статистически значимым, поскольку в индустриально-организационной психологии уже давно приняты стандарты статистической значимости 0,05 и 0,01 (R< 0.01). This fact is important for understanding the literature, but should not be taken to mean that it is pointless to conduct observations that do not meet these standards. So-called non-significant research results (observations that can be obtained by chance) more one to five times out of 100) can be very useful in identifying trends and as a guide to future research.

It should also be noted that not all psychologists agree with traditional standards and procedures (e.g., Cohen, 1994; Sauley & Bedeian, 1989). Measurement issues are themselves main theme the work of many researchers studying the accuracy of measurement methods and the assumptions that underlie existing methods and standards, as well as developing new clinicians and instruments. Perhaps sometime in the future, research in this power will lead to changes in traditional standards for assessing statistical significance, and these changes will gain widespread acceptance. (The Fifth Division of the American Psychological Association is a group of psychologists who specialize in the study of assessment, measurement, and statistics.)

In research reports, a probabilistic statement such as R< 0.05, due to some statistics, that is, a number that is obtained as a result of a certain set of mathematical computational procedures. Probabilistic confirmation is obtained by comparing these statistics with data from special tables that are published for this purpose. In industrial-organizational psychological research, statistics such as r, F, t, r>(read “chi square”) and R(read "multiple" R"). In each case, the statistics (one number) obtained from the analysis of a series of observations can be compared with numbers from a published table. After this, you can formulate a probabilistic statement about the probability of randomly obtaining this number, that is, draw a conclusion about the significance of the observations.

To understand the studies described in this book, it is sufficient to have a clear understanding of the concept of statistical significance and not necessarily to know how the statistics mentioned above are calculated. However, it would be useful to discuss one assumption that underlies all of these procedures. This is the assumption that all observed variables are distributed approximately normal law. In addition, when reading reports of industrial-organizational psychological research, three other concepts often come across that play important role- firstly, correlation and correlation relationship, secondly, determinant/predictive variable and “ANOVA” (analysis of variance), thirdly, a group of statistical methods under the general name “meta-analysis”.

The main features of any relationship between variables.

We can note the two simplest properties of the relationship between variables: (a) the magnitude of the relationship and (b) the reliability of the relationship.

- Magnitude . Dependency magnitude is easier to understand and measure than reliability. For example, if any man in the sample had a white blood cell count (WCC) value higher than any woman, then you can say that the relationship between the two variables (Gender and WCC) is very high. In other words, you could predict the values ​​of one variable from the values ​​of another.

- Reliability (“truth”). The reliability of interdependence is a less intuitive concept than the magnitude of dependence, but it is extremely important. The reliability of the relationship is directly related to the representativeness of a certain sample on the basis of which conclusions are drawn. In other words, reliability refers to how likely it is that a relationship will be rediscovered (in other words, confirmed) using data from another sample drawn from the same population.

It should be remembered that the ultimate goal is almost never to study this particular sample of values; a sample is of interest only insofar as it provides information about the entire population. If the study satisfies certain specific criteria, then the reliability of the found relationships between sample variables can be quantified and presented using a standard statistical measure.

The magnitude of the dependence and reliability represent two various characteristics dependencies between variables. However, it cannot be said that they are completely independent. The greater the magnitude of the relationship (connection) between variables in a sample of normal size, the more reliable it is (see the next section).

The statistical significance of a result (p-level) is an estimated measure of confidence in its “truth” (in the sense of “representativeness of the sample”). More technically speaking, the p-level is a measure that varies in decreasing order of magnitude with the reliability of the result. More high p-level corresponds more low level confidence in the relationship between variables found in the sample. Namely, the p-level represents the probability of error associated with the distribution of the observed result to the entire population.

For example, p-level = 0.05(i.e. 1/20) indicates that there is a 5% chance that the relationship between variables found in the sample is just a random feature of the sample. In many studies, a p-level of 0.05 is considered an "acceptable margin" for the level of error.

There is no way to avoid arbitrariness in deciding what level of significance should truly be considered "significant". The choice of a certain significance level above which results are rejected as false is quite arbitrary.



On practice final decision usually depends on whether the result was predicted a priori (i.e., before the experiment was conducted) or discovered a posteriori as a result of many analyzes and comparisons performed on a variety of data, as well as on the tradition of the field of study.

Generally, in many fields, a result of p .05 is an acceptable cutoff for statistical significance, but keep in mind that this level still includes a fairly large margin of error (5%).

Results significant at the p .01 level are generally considered statistically significant, while results at the p .005 or p .00 level are generally considered statistically significant. 001 as highly significant. However, it should be understood that this classification of significance levels is quite arbitrary and is just an informal agreement adopted on the basis practical experience in a particular field of study.

It is clear that what larger number analyzes will be carried out on the totality of the collected data, the greater the number of significant (at the selected level) results will be discovered purely by chance.

Some statistical methods, involving many comparisons, and thus having a significant chance of repeating this kind of error, make a special adjustment or correction for total number comparisons. However, many statistical methods (especially simple methods exploratory data analysis) do not offer any way to solve this problem.

If the relationship between variables is “objectively” weak, then there is no other way to test such a relationship other than to study a large sample. Even if the sample is perfectly representative, the effect will not be statistically significant if the sample is small. Likewise, if a relationship is "objectively" very strong, then it can be detected with high degree significance even in a very small sample.

The weaker the relationship between variables, the larger the sample size required to meaningfully detect it.

Many different measures of relationship between variables. The choice of a particular measure in a particular study depends on the number of variables, the measurement scales used, the nature of the relationships, etc.

Most of these measures, however, are subject to general principle: They attempt to estimate the observed dependence by comparing it with the "maximum conceivable dependence" between the variables under consideration. Technically speaking, the usual way to make such estimates is to look at how the values ​​of the variables vary and then calculate how much of the total variation present can be explained by the presence of "common" ("joint") variation in two (or more) variables.

Significance depends mainly on the sample size. As already explained, in very large samples even very weak relationships between variables will be significant, while in small samples even very strong relationships are not reliable.

Thus, in order to determine the level of statistical significance, a function is needed that represents the relationship between the “magnitude” and “significance” of the relationship between variables for each sample size.

Such a function would indicate exactly “how likely it is to obtain a dependence of a given value (or more) in a sample of a given size, assuming that there is no such dependence in the population.” In other words, this function would give a significance level
(p-level), and, therefore, the probability of erroneously rejecting the assumption of the absence of this dependence in the population.

This "alternative" hypothesis (that there is no relationship in the population) is usually called null hypothesis.

It would be ideal if the function that calculates the probability of error were linear and only had different slopes for different sample sizes. Unfortunately, this function is much more complex and is not always exactly the same. However, in most cases its form is known and can be used to determine significance levels in studies of samples of a given size. Most of these functions are associated with a class of distributions called normal .

STATISTICAL RELIABILITY

- English credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity and lack of ambiguity in a statistical test or in a q.l. set of measurements. D. s. can be tested by repeating the same test (or questionnaire) on the same subject to see if the same results are obtained; or by comparing different parts of a test that are supposed to measure the same object.

Antinazi. Encyclopedia of Sociology, 2009

See what “STATISTICAL RELIABILITY” is in other dictionaries:

    STATISTICAL RELIABILITY- English credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity and lack of ambiguity in a statistical test or in a q.l. set of measurements. D. s. can be verified by repeating the same test (or... Dictionary in Sociology

    In statistics, a value is called statistically significant if the probability of its occurrence by chance or even more extreme values ​​is low. Here, by extreme we mean the degree of deviation of the test statistics from the null hypothesis. The difference is called... ...Wikipedia

    The physical phenomenon of statistical stability is that as the sample size increases, the frequency of a random event or the average value physical quantity tends to some fixed number. The phenomenon of statistical... ... Wikipedia

    RELIABILITY OF DIFFERENCES (Similarities)- analytical statistical procedure for establishing the level of significance of differences or similarities between samples according to the studied indicators (variables) ... Modern educational process: basic concepts and terms

    REPORTING, STATISTICAL Great Accounting Dictionary

    REPORTING, STATISTICAL- a form of state statistical observation, in which the relevant bodies receive from enterprises (organizations and institutions) the information they need in the form of legally established reporting documents (statistical reports) for... Large economic dictionary

    The science that studies techniques for systematic observation of mass phenomena social life humans, compiling their numerical descriptions and scientific processing of these descriptions. Thus, theoretical statistics is a science... ... encyclopedic Dictionary F. Brockhaus and I.A. Efron

    Correlation coefficient- (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables Definition of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application... ... Investor Encyclopedia

    Statistics- (Statistics) Statistics is a general theoretical science that studies quantitative changes in phenomena and processes. State statistics, statistical services, Rosstat (Goskomstat), statistical data, query statistics, sales statistics,... ... Investor Encyclopedia

    Correlation- (Correlation) Correlation is a statistical relationship between two or more random variables. The concept of correlation, types of correlation, correlation coefficient, correlation analysis, price correlation, correlation of currency pairs on Forex Contents... ... Investor Encyclopedia

Books

  • Research in mathematics and mathematics in research: Methodological collection on student research activities, Borzenko V.I.. The collection presents methodological developments, applicable in the organization research activities students. The first part of the collection is devoted to the application of a research approach in...

If you don’t act, the ward will be of no use. (Shota Rustaveli)

Basic terms and concepts of medical statistics

In this article we will present some key concepts statistics relevant to medical research. The terms are discussed in more detail in the relevant articles.

Variation

Definition. The degree of dispersion of data (attribute values) over the range of values

Probability

Definition. Probability is the degree of possibility of the occurrence of a certain event under certain conditions.

Example. Let us explain the definition of the term in the sentence “Probability of recovery when using medicinal product Arimidex is 70%." The event is “recovery of the patient”, the condition “the patient takes Arimidex”, the degree of possibility is 70% (roughly speaking, out of 100 people taking Arimidex, 70 recover).

Cumulative probability

Definition. The Cumulative Probability of surviving at time t is the same as the proportion of patients alive at that time.

Example. If it is said that the cumulative probability of survival after a five-year course of treatment is 0.7, then this means that of the group of patients under consideration, 70% of the initial number remained alive, and 30% died. In other words, out of every hundred people, 30 died within the first 5 years.

Time before event

Definition. Time before an event is the time, expressed in some units, that has passed from some initial point in time until the occurrence of some event.

Explanation. As units of time in medical research days, months and years appear.

Typical examples of initial times:

    start monitoring the patient

    surgical treatment

Typical examples of the events considered:

    disease progression

    occurrence of relapse

    patient death

Sample

Definition. The part of a population obtained by selection.

Based on the results of the sample analysis, conclusions are drawn about the entire population, which is valid only if the selection was random. Since random selection from a population is practically impossible, one should strive to ensure that the sample is at least representative of the population.

Dependent and independent samples

Definition. Samples in which study subjects were recruited independently of each other. An alternative to independent samples is dependent (connected, paired) samples.

Hypothesis

Two-sided and one-sided hypotheses

First, let us explain the use of the term hypothesis in statistics.

The purpose of most research is to test the truth of some statement. The purpose of drug testing is most often to test the hypothesis that one drug is more effective than another (for example, Arimidex is more effective than Tamoxifen).

To ensure the rigor of the study, the statement being verified is expressed mathematically. For example, if A is the number of years that a patient taking Arimidex will live, and T is the number of years that a patient taking Tamoxifen will live, then the hypothesis being tested can be written as A>T.

Definition. A hypothesis is called two-sided if it consists in the equality of two quantities.

An example of a two-sided hypothesis: A=T.

Definition. A hypothesis is called one-sided (1-sided) if it consists in the inequality of two quantities.

Examples of one-sided hypotheses:

Dichotomous (binary) data

Definition. Data expressed by only two valid alternative values

Example: The patient is “healthy” - “sick”. Edema “is” - “no”.

Confidence interval

Definition. The confidence interval for a quantity is the range around the value of the quantity in which the true value of that quantity lies (with a certain level of confidence).

Example. Let the quantity under study be the number of patients per year. On average, their number is 500, and 95% - confidence interval- (350, 900). This means that, most likely (with a probability of 95%), at least 350 and no more than 900 people will contact the clinic during the year.

Designation. A very commonly used abbreviation is: CI 95% is a confidence interval with a confidence level of 95%.

Reliability, statistical significance (P - level)

Definition. The statistical significance of a result is a measure of confidence in its “truth.”

Any research is carried out on the basis of only a part of the objects. A study of the effectiveness of a drug is carried out not on the basis of all patients on the planet, but only on a certain group of patients (it is simply impossible to conduct an analysis on the basis of all patients).

Let's assume that as a result of the analysis a certain conclusion was made (for example, the use of Arimidex as an adequate therapy is 2 times more effective than Tamoxifen).

The question that needs to be asked is: “How much can you trust this result?”

Imagine that we conducted a study based on only two patients. Of course, in this case the results should be treated with caution. If a large number of patients were examined (numerical value “ large quantity“depends on the situation), then the conclusions drawn can already be trusted.

So, the degree of trust is determined by the p-level value (p-value).

A higher p-level corresponds to a lower level of confidence in the results obtained from the sample analysis. For example, a p-level equal to 0.05 (5%) indicates that the conclusion drawn from the analysis of a certain group is only a random feature of these objects with a probability of only 5%.

In other words, with a very high probability (95%) the conclusion can be extended to all objects.

Many studies consider 5% as an acceptable p-level value. This means that if, for example, p = 0.01, then the results can be trusted, but if p = 0.06, then you cannot.

Study

Prospective study is a study in which samples are selected on the basis of an initial factor, and some resulting factor is analyzed in the samples.

Retrospective study is a study in which samples are selected on the basis of a resulting factor, and some initial factor is analyzed in the samples.

Example. The initial factor is a pregnant woman younger/over 20 years old. The resulting factor is the child is lighter/heavier than 2.5 kg. We analyze whether the child’s weight depends on the mother’s age.

If we recruit 2 samples, one with mothers under 20 years of age, the other with mothers older, and then analyze the mass of children in each group, then this is a prospective study.

If we recruit 2 samples, in one - mothers who gave birth to children lighter than 2.5 kg, in the other - heavier, and then analyze the age of the mothers in each group, then this is a retrospective study (naturally, such a study can be carried out only when the experiment is completed, i.e. all children were born).

Exodus

Definition. A clinically significant phenomenon, laboratory indicator or sign that serves as an object of interest to the researcher. When conducting clinical trials, outcomes serve as criteria for assessing the effectiveness of a therapeutic or preventive intervention.

Clinical epidemiology

Definition. Science that makes it possible to predict a particular outcome for each specific patient based on studying the clinical course of the disease in similar cases using strict scientific methods studying patients to ensure accuracy of forecasts.

Cohort

Definition. A group of study participants united by some common feature at the time of its formation and studied throughout long period time.

Control

Historical control

Definition. A control group formed and examined in the period preceding the study.

Parallel control

Definition. A control group formed simultaneously with the formation of the main group.

Correlation

Definition. Statistical relationship between two characteristics (quantitative or ordinal), showing that higher value one characteristic in a certain part of cases corresponds to a greater value - in the case of a positive (direct) correlation - the value of another characteristic or lower value- in case of negative (inverse) correlation.

Example. A significant correlation was found between the levels of platelets and leukocytes in the patient’s blood. The correlation coefficient is 0.76.

Risk coefficient (RR)

Definition. The risk ratio is the ratio of the probability of the occurrence of some (“bad”) event for the first group of objects to the probability of the occurrence of the same event for the second group of objects.

Example. If the probability of developing lung cancer in non-smokers is 20%, and in smokers - 100%, then the CR will be equal to one fifth. In this example, the first group of objects are non-smokers, the second group consists of smokers, and the occurrence of lung cancer is considered as a “bad” event.

It's obvious that:

1) if KR = 1, then the probability of an event occurring in groups is the same

2) if KP>1, then the event occurs more often with objects from the first group than from the second

3) if KR<1, то событие чаще происходит с объектами из второй группы, чем из первой

Meta-analysis

Definition. WITH statistical analysis that summarizes the results of several studies investigating the same problem (usually the effectiveness of treatment, prevention, diagnostic methods). Pooling studies provides a larger sample for analysis and greater statistical power for the combined studies. Used to increase the evidence or confidence in a conclusion about the effectiveness of the method under study.

Kaplan-Meier method (Kaplan-Meier multiplier estimators)

This method was invented by statisticians E.L. Kaplan and Paul Meyer.

The method is used to calculate various quantities associated with the observation time of a patient. Examples of such quantities:

    probability of recovery within one year when using the drug

    chance of relapse after surgery within three years after surgery

    cumulative probability of survival at five years among patients with prostate cancer following organ amputation

Let us explain the advantages of using the Kaplan-Meier method.

The values ​​of the values ​​in “conventional” analysis (not using the Kaplan-Meier method) are calculated based on dividing the time interval under consideration into intervals.

For example, if we are studying the probability of a patient’s death within 5 years, then the time interval can be divided into 5 parts (less than 1 year, 1-2 years, 2-3 years, 3-4 years, 4-5 years), so and for 10 (six months each), or for another number of intervals. The results for different partitions will be different.

Choosing the most appropriate partition is not an easy task.

Estimates of values ​​obtained using the Kaplan-Meier method do not depend on the division of observation time into intervals, but depend only on the life time of each individual patient.

Therefore, it is easier for the researcher to carry out the analysis, and the results are often better than the results of “conventional” analysis.

The Kaplan - Meier curve is a graph of the survival curve obtained using the Kaplan-Meier method.

Cox model

This model was invented by Sir David Roxby Cox (b. 1924), a famous English statistician, author of more than 300 articles and books.

The Cox model is used in situations where the quantities studied in the survival analysis depend on functions of time. For example, the probability of relapse after t years (t=1,2,...) may depend on the logarithm of time log(t).

An important advantage of the method proposed by Cox is the applicability of this method in a large number of situations (the model does not impose strict restrictions on the nature or shape of the probability distribution).

Based on the Cox model, an analysis can be performed (called Cox analysis), the result of which is the value of the risk coefficient and the confidence interval for the risk coefficient.

Nonparametric statistical methods

Definition. A class of statistical methods that are used primarily for the analysis of quantitative data that does not form a normal distribution, as well as for the analysis of qualitative data.

Example. To identify the significance of differences in the systolic pressure of patients depending on the type of treatment, we will use the nonparametric Mann-Whitney test.

Sign (variable)

Definition. X characteristics of the object of study (observation). There are qualitative and quantitative characteristics.

Randomization

Definition. A method of randomly distributing research objects into the main and control groups using special means (tables or random number counter, coin toss and other methods of randomly assigning a group number to an included observation). Randomization minimizes differences between groups on known and unknown characteristics that potentially influence the outcome being studied.

Risk

Attributive- additional risk of an unfavorable outcome (for example, disease) due to the presence of a certain characteristic (risk factor) in the subject of the study. This is the portion of the risk of developing a disease that is associated with, explained by, and can be eliminated if the risk factor is eliminated.

Relative risk- the ratio of the risk of an unfavorable condition in one group to the risk of this condition in another group. Used in prospective and observational studies when groups are formed in advance and the occurrence of the condition being studied has not yet occurred.

Rolling exam

Definition. A method for checking the stability, reliability, performance (validity) of a statistical model by sequentially removing observations and recalculating the model. The more similar the resulting models are, the more stable and reliable the model is.

Event

Definition. The clinical outcome observed in the study, such as the occurrence of a complication, relapse, recovery, or death.

Stratification

Definition. M a sampling technique in which the population of all participants who meet the inclusion criteria for a study is first divided into groups (strata) based on one or more characteristics (usually sex, age) potentially influencing the outcome of interest, and then from each of these groups ( stratum) participants are recruited independently into the experimental and control groups. This allows the researcher to balance important characteristics between the experimental and control groups.

Contingency table

Definition. A table of absolute frequencies (numbers) of observations, the columns of which correspond to the values ​​of one characteristic, and the rows - to the values ​​of another characteristic (in the case of a two-dimensional contingency table). Absolute frequency values ​​are located in cells at the intersection of rows and columns.

Let's give an example of a contingency table. Aneurysm surgery was performed in 194 patients. The severity of edema in patients before surgery is known.

Edema\ Outcome

no swelling 20 6 26
moderate swelling 27 15 42
pronounced edema 8 21 29
m j 55 42 194

Thus, out of 26 patients without edema, 20 patients survived after surgery, and 6 patients died. Of the 42 patients with moderate edema, 27 patients survived, 15 died, etc.

Chi-square test for contingency tables

To determine the significance (reliability) of differences in one sign depending on another (for example, the outcome of an operation depending on the severity of edema), the chi-square test is used for contingency tables:


Chance

Let the probability of some event be equal to p. Then the probability that the event will not occur is 1-p.

For example, if the probability that a patient will remain alive after five years is 0.8 (80%), then the probability that he will die during this time period is 0.2 (20%).

Definition. Chance is the ratio of the probability that an event will occur to the probability that the event will not occur.

Example. In our example (about a patient), the chance is 4, since 0.8/0.2=4

Thus, the probability of recovery is 4 times greater than the probability of death.

Interpretation of the value of a quantity.

1) If Chance=1, then the probability of an event occurring is equal to the probability that the event will not occur;

2) if Chance >1, then the probability of the event occurring is greater than the probability that the event will not occur;

3) if Chance<1, то вероятность наступления события меньше вероятности того, что событие не произойдёт.

Odds ratio

Definition. Odds ratio is the odds ratio for the first group of objects to the odds ratio for the second group of objects.

Example. Let us assume that both men and women undergo some treatment.

The probability that a male patient will remain alive after five years is 0.6 (60%); the probability that he will die during this time period is 0.4 (40%).

Similar probabilities for women are 0.8 and 0.2.

The odds ratio in this example is

Interpretation of the value of a quantity.

1) If the odds ratio = 1, then the chance for the first group is equal to the chance for the second group

2) If the odds ratio is >1, then the chance for the first group is greater than the chance for the second group

3) If the odds ratio<1, то шанс для первой группы меньше шанса для второй группы

The statistical significance of a result (p-value) is an estimated measure of confidence in its “truth” (in the sense of “representativeness of the sample”). More technically speaking, a p-value is a measure that varies in decreasing order of magnitude with the reliability of the result. A higher p-value corresponds to a lower level of confidence in the relationship between variables found in the sample. Specifically, the p-value represents the probability of error associated with generalizing the observed result to the entire population. For example, a p-value of 0.05 (i.e. 1/20) indicates that there is a 5% chance that the relationship between variables found in the sample is just a random feature of the sample. In other words, if a given relationship does not exist in a population, and you conduct similar experiments many times, then in about one in twenty repetitions of the experiment you would expect the same or stronger relationship between the variables.

In many studies, a p-value of 0.05 is considered an “acceptable margin” for the level of error.

There is no way to avoid arbitrariness in deciding what level of significance should truly be considered “significant.” The choice of a certain significance level above which results are rejected as false is quite arbitrary. In practice, the final decision usually depends on whether the result was predicted a priori (i.e., before the experiment was carried out) or discovered a posteriori as a result of many analyzes and comparisons performed on a variety of data, as well as on the tradition of the field of study. Typically, in many fields, a result of p 0.05 is an acceptable limit for statistical significance, but it should be remembered that this level still includes a fairly large error rate (5%). Results significant at the p 0.01 level are generally considered statistically significant, and results with a p 0.005 or p 0.001 level are generally considered highly significant. However, it should be understood that this classification of significance levels is quite arbitrary and is just an informal agreement adopted on the basis of practical experience in a particular area of ​​research.

As already mentioned, the magnitude of the relationship and reliability represent two different characteristics of the relationships between variables. However, it cannot be said that they are completely independent. Generally speaking, the greater the magnitude of the relationship (relationship) between variables in a sample of normal size, the more reliable it is.

If we assume that there is no relationship between the corresponding variables in the population, then it is most likely to expect that in the sample under study there will also be no relationship between these variables. Thus, the stronger a relationship is found in a sample, the less likely it is that the relationship does not exist in the population from which it is drawn.


The sample size affects the significance of the relationship. If there are few observations, then there are correspondingly few possible combinations of values ​​for these variables and thus the probability of accidentally discovering a combination of values ​​that shows a strong relationship is relatively high.

How the level of statistical significance is calculated. Let's assume you have already calculated a measure of dependence between two variables (as explained above). The next question facing you is: “how significant is this relationship?” For example, is 40% explained variance between two variables sufficient to consider the relationship significant? The answer: “depending on the circumstances.” Namely, significance depends mainly on the sample size. As already explained, in very large samples even very weak relationships between variables will be significant, while in small samples even very strong relationships are not reliable. Thus, in order to determine the level of statistical significance, you need a function that represents the relationship between the "magnitude" and the "significance" of the relationship between variables for each sample size. This function would tell you exactly “how likely it is to obtain a relationship of a given value (or more) in a sample of a given size, assuming that there is no such relationship in the population.” In other words, this function would give the level of significance (p-value), and therefore the probability of falsely rejecting the assumption that a given relationship does not exist in the population. This “alternative” hypothesis (that there is no relationship in the population) is usually called the null hypothesis. It would be ideal if the function that calculates the probability of error were linear and only had different slopes for different sample sizes. Unfortunately, this function is much more complex and is not always exactly the same. However, in most cases its form is known and can be used to determine significance levels in studies of samples of a given size. Most of these functions are associated with a very important class of distributions called normal.