# Confidence interval. What is it and how can it be used?

Confidence interval came to us from the regionstatistics. This is a certain range, which serves to evaluate an unknown parameter with a high degree of reliability. The easiest way to explain this is with an example.

Suppose we need to investigate someA random value, for example, the server's response rate to the client's request. Every time a user dials the address of a particular site, the server reacts to this at different speeds. Thus, the response time under study is random. So, the confidence interval allows us to determine the limits of this parameter, and then it will be possible to assert that with a probability of 95% the reaction speed of the server will be in the range we calculated.

Or you need to know how many peopleknown about the brand of the company. When the confidence interval is calculated, it will be possible, for example, to say that with 95% of the share of the probability, the share of consumers who know about this brand is in the range from 27% to 34%.

This term is closely related to such a value asconfidence probability. It is the probability that the desired parameter enters the confidence interval. From this value depends how large our desired range will be. The more importance it takes, the narrower the confidence interval becomes, and vice versa. Usually it is set at 90%, 95% or 99%. The value of 95% is most popular.

This indicator is also influenced byvariance of observations and sample size. Its definition is based on the assumption that the feature under investigation obeys the normal distribution law. This statement is also known as the Gaussian Law. According to him, the distribution of all probabilities of a continuous random variable is called normal, which can be described by the probability density. If the assumption of a normal distribution turned out to be erroneous, the estimate may be incorrect.

First, we'll figure out how to calculateconfidence interval for mathematical expectation. There are two possible cases. The variance (the degree of spread of a random variable) can be known or not. If it is known, then our confidence interval is calculated using the following formula:

xsr - t * σ / (sqrt (n)) <= α <= xcp + t * σ / (sqrt (n)), where

α is a sign,

t is a parameter from the Laplace distribution table,

sqrt (n) is the square root of the total sample size,

σ is the square root of the variance.

If the variance is unknown, then it can be calculated if we know all the values of the desired characteristic. The following formula is used for this:

σ2 = x2cp - (xcp) 2, where

x2cp is the average value of the squares of the test feature,

(xcp) 2 is the square of the mean value of this characteristic.

The formula for calculating the confidence interval in this case varies slightly:

xsp - t * s / (sqrt (n)) <= α <= xcp + t * s / (sqrt (n)), where

xsr is the sample mean,

α is a sign,

t is a parameter that is found using the Student's distribution table t = t (ɣ; n-1),

sqrt (n) is the square root of the total sample size,

s is the square root of the variance.

Consider this example. Suppose that based on the results of 7 measurements, the average value of the test feature was determined equal to 30 and the sample variance equal to 36. It is necessary to find with a probability of 99% a confidence interval that contains the true value of the measured parameter.

First we define what is equal to t: t = t (0,99; 7-1) = 3.71. We use the above formula, we get:

xsr - t * s / (sqrt (n)) <= α <= xcp + t * s / (sqrt (n))

30 - 3.71 * 36 / (sqrt (7)) <= α <= 30 + 3.71 * 36 / (sqrt (7))

21.587 <= α <= 38.413

Confidence Interval for Varianceis calculated both in the case of the known mean, and also when there is no data on the mathematical expectation, and only the value of the point unbiased dispersion estimate is known. We will not give here the formulas for its calculation, since they are rather complex and, if desired, they can always be found on the net.

We only note that it is convenient to determine the confidence interval using an Excel program or a network service, which is called that.