- Last updated
- Save as PDF
- Page ID
- 1730
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)
Learning Objectives
- A statistic of dispersion tells you how spread out a set of measurements is. Standard deviation is the most common, but there are others.
Summarizing data from a measurement variable requires a number that represents the "middle" of a set of numbers (known as a "statistic of central tendency" or "statistic of location"), along with a measure of the "spread" of the numbers (known as a "statistic of dispersion"). You use a statistic of dispersion to give a single number that describes how compact or spread out a set of observations is. Although statistics of dispersion are usually not very interesting by themselves, they form the basis of most statistical tests used on measurement variables.
Range
This is simply the difference between the largest and smallest observations. This is the statistic of dispersion that people use in everyday conversation; if you were telling your Uncle Cletus about your research on the giant deep-sea isopod Bathynomus giganteus, you wouldn't blather about means and standard deviations, you'd say they ranged from \(4.4cm\) to \(36.5cm\) long (Biornes-Fourzán and Lozano-Alvarez 1991). Then you'd explain that isopods are roly-polies, and \(36.5cm\) is about \(14\) American inches, and Uncle Cletus would finally be impressed, because a roly-poly that's over a foot long is pretty impressive.
Range is not very informative for statistical purposes. The range depends only on the largest and smallest values, so that two sets of data with very different distributions could have the same range, or two samples from the same population could have very different ranges, purely by chance. In addition, the range increases as the sample size increases; the more observations you make, the greater the chance that you'll sample a very large or very small value.
There is no range function in spreadsheets; you can calculate the range by using: Range = MAX(Ys)−MIN(Ys), where \(Ys\) represents a set of cells.
Sum of squares
This is not really a statistic of dispersion by itself, but I mention it here because it forms the basis of the variance and standard deviation. Subtract the mean from an observation and square this "deviate". Squaring the deviates makes all of the squared deviates positive and has other statistical advantages. Do this for each observation, then sum these squared deviates. This sum of the squared deviates from the mean is known as the sum of squares. It is given by the spreadsheet function DEVSQ(Ys) (not by the function SUMSQ). You'll probably never have a reason to calculate the sum of squares, but it's an important concept.
Parametric variance
If you take the sum of squares and divide it by the number of observations (\(n\)), you are computing the average squared deviation from the mean. As observations get more and more spread out, they get farther from the mean, and the average squared deviate gets larger. This average squared deviate, or sum of squares divided by \(n\), is the parametric variance. You can only calculate the parametric variance of a population if you have observations for every member of a population, which is almost never the case. I can't think of a good biological example where using the parametric variance would be appropriate; I only mention it because there's a spreadsheet function for it that you should never use, VARP(Ys).
Sample variance
You almost always have a sample of observations that you are using to estimate a population parameter. To get an unbiased estimate of the population variance, divide the sum of squares by \(n-1\), not by \(n\). This sample variance, which is the one you will always use, is given by the spreadsheet function VAR(Ys). From here on, when you see "variance," it means the sample variance.
You might think that if you set up an experiment where you gave \(10\) guinea pigs little argyle sweaters, and you measured the body temperature of all \(10\) of them, that you should use the parametric variance and not the sample variance. You would, after all, have the body temperature of the entire population of guinea pigs wearing argyle sweaters in the world. However, for statistical purposes you should consider your sweater-wearing guinea pigs to be a sample of all the guinea pigs in the world who could have worn an argyle sweater, so it would be best to use the sample variance. Even if you go to Española Island and measure the length of every single tortoise (Geochelone nigra hoodensis) in the population of tortoises living there, for most purposes it would be best to consider them a sample of all the tortoises that could have been living there.
Standard Deviation
Variance, while it has useful statistical properties that make it the basis of many statistical tests, is in squared units. A set of lengths measured in centimeters would have a variance expressed in square centimeters, which is just weird; a set of volumes measured in \(cm^3\) would have a variance expressed in \(cm^6\), which is even weirder. Taking the square root of the variance gives a measure of dispersion that is in the original units. The square root of the parametric variance is the parametric standard deviation, which you will never use; is given by the spreadsheet function STDEVP(Ys). The square root of the sample variance is given by the spreadsheet function STDEV(Ys). You should always use the sample standard deviation; from here on, when you see "standard deviation," it means the sample standard deviation.
The square root of the sample variance actually underestimates the sample standard deviation by a little bit. Gurland and Tripathi (1971) came up with a correction factor that gives a more accurate estimate of the standard deviation, but very few people use it. Their correction factor makes the standard deviation about \(3\%\) bigger with a sample size of \(9\), and about \(1\%\) bigger with a sample size of \(25\), for example, and most people just don't need to estimate standard deviation that accurately. Neither SAS nor Excel uses the Gurland and Tripathi correction; I've included it as an option in my descriptive statistics spreadsheet. If you use the standard deviation with the Gurland and Tripathi correction, be sure to say this when you write up your results.
In addition to being more understandable than the variance as a measure of the amount of variation in the data, the standard deviation summarizes how close observations are to the mean in an understandable way. Many variables in biology fit the normal probability distribution fairly well. If a variable fits the normal distribution, \(68.3\%\) (or roughly two-thirds) of the values are within one standard deviation of the mean, \(95.4\%\) are within two standard deviations of the mean, and \(99.7\) (or almost all) are within \(3\) standard deviations of the mean. Thus if someone says the mean length of men's feet is \(270mm\) with a standard deviation of \(13mm\), you know that about two-thirds of men's feet are between \(257mm\) and \(283mm\) long, and about \(95\%\) of men's feet are between \(244mm\) and \(296mm\) long. Here's a histogram that illustrates this:

The proportions of the data that are within \(1\), \(2\), or \(3\) standard deviations of the mean are different if the data do not fit the normal distribution, as shown for these two very non-normal data sets:

Coefficient of Variation
Coefficient of variation is the standard deviation divided by the mean; it summarizes the amount of variation as a percentage or proportion of the total. It is useful when comparing the amount of variation for one variable among groups with different means, or among different measurement variables. For example, the United States military measured foot length and foot width in 1774 American men. The standard deviation of foot length was \(13.1mm\) and the standard deviation for foot width was \(5.26mm\), which makes it seem as if foot length is more variable than foot width. However, feet are longer than they are wide. Dividing by the means (\(269.7mm\) for length, \(100.6mm\) for width), the coefficients of variation is actually slightly smaller for length (\(4.9\%\)) than for width (\(5.2\%\)), which for most purposes would be a more useful measure of variation.
Example
Here are the statistics of dispersion for the blacknose dace data from the central tendency web page. In reality, you would rarely have any reason to report all of these:
- Range 90
- Variance 1029.5
- Standard deviation 32.09
- Coefficient of variation 45.8%
How to calculate the statistics
Spreadsheet
I have made a spreadsheet descriptive.xls that calculates the range, sample variance, sample standard deviation (with or without the Gurland and Tripathi correction), and coefficient of variation, for up to \(1000\) observations.
Web pages
This web page calculates standard deviation and other descriptive statistics for up to \(10,000\) observations.
This web page calculates range, variance, and standard deviation, along with other descriptive statistics. I don't know the maximum number of observations it can handle.
R
Salvatore Mangiafico's \(R\) Companion has a sample R program for calculating range, sample variance, standard deviation, and coefficient of variation.
SAS
PROC UNIVARIATE will calculate the range, variance, standard deviation (without the Gurland and Tripathi correction), and coefficient of variation. It calculates the sample variance and sample standard deviation. For examples, see the central tendency web page.
Reference
- Briones-Fourzán, P., and E. Lozano-Alvarez. 1991. Aspects of the biology of the giant isopod Bathynomus giganteus A. Milne Edwards, 1879 (Flabellifera: Cirolanidae), off the Yucatan Peninsula. Journal of Crustacean Biology 11: 375-385.
- Gurland, J., and R.C. Tripathi. 1971. A simple approximation for unbiased estimation of the standard deviation. American Statistician 25: 30-32.
FAQs
What is a good measure of dispersion? ›
Standard deviation (SD) is the most commonly used measure of dispersion.
How do you interpret dispersion results? ›Lower dispersion indicates higher precision in the manufacturing process or data measurements, whereas higher dispersion means lower accuracy. One can use dispersion to understand the variation in the values of the data set.
How close to 1 do dispersion statistics need to be? ›If the counts follow a geometric or negative binomial, then the index of dispersion should be greater than 1. If the counts follow a binomial distribution, the index of dispersion should be less than 1.
What does a low dispersion mean? ›In a statistical sense, dispersion has two meanings: first it measures the variation of the items among themselves, and second, it measures the variation around the average. If the difference between the value and average is high, then dispersion will be high. Otherwise it will be low.
How do you compare data dispersion? ›- Based on Range = (X max – X min) ⁄ (X max + X min).
- C.D. based on quartile deviation = (Q3 – Q1) ⁄ (Q3 + Q1).
- Based on mean deviation = Mean deviation/average from which it is calculated.
- For Standard deviation = S.D. ⁄ Mean.
Measures of dispersion describe the spread of the data. They include the range, interquartile range, standard deviation and variance. The range is given as the smallest and largest observations. This is the simplest measure of variability.
What is dispersion of scores in statistics? ›In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.
What is the dispersion percentage? ›The dispersion percentage is the ratio of the soil material <0.005 mm after limited mechanical dispersion without dispersants to the total material, <0.005 mm, expressed as a percentage.
What is dispersion statistics Why is it important? ›While measures of central tendency are used to estimate "normal" values of a dataset, measures of dispersion are important for describing the spread of the data, or its variation around a central value. Two distinct samples may have the same mean or median, but completely different levels of variability, or vice versa.
Why is dispersion important in statistics? ›Importance of Dispersion in statistics:
Dispersion can control the various conclusions drawn from central tendency. Dispersion measures the inequalities in the distribution of income and wealth. Dispersion can evaluate the average profit, average etc in industry, trade. It is used to control price and output.
What is dispersion in answer? ›
Answer: Solution: The breaking up of the white light into its constituent colours as it passes through a refracting medium such as prism is known as dispersion.
What is an example of dispersion in statistics? ›Measures of Dispersion Example
Suppose we have two data sets A = {3, 1, 6, 2} and B = {1, 5, 9, 10}. The variance(population) of A is 3.5 and the variance(population) of B is 12.68. This implies that data set B is more variable than data set A.
Conclusion: Dispersion in statistics refers to the measure of the variability of data or terms. Such variability may give random measurement errors where some of the instrumental measurements are found to be imprecise. It is a statistical way of describing how the terms are spread out in different data sets.
What is poor measure of dispersion? ›The range is a poor measure of dispersion where extremely large values are present. The quartile deviation is defined half of the difference between the third and the first quartiles: 2.
Is high dispersion good or bad? ›Yet the common view in this literature remains that a high level of productivity dispersion is a sign of resource misallocation and therefore reduced welfare; i.e., dispersion is “bad.”
What is normal dispersion? ›: dispersion (as of light by an optical grating) in which the separation of components in any one spectrum increases continuously and almost uniformly with the wavelength, the separation being a monotonic function of the dispersion variable.
What is an example of a dispersion? ›Examples of dispersion in our daily life:
After the rains, we see the rainbow in the sky which is due to the dispersion of the sunlight. When the petrol mixes with the water we can see different colors, which is due to dispersion of colors. Dispersion of colors in soap bubbles. Dispersion of colors on CDs.
Dispersion is used to measure the variability in the data or to see how spread out the data is. It measures how much the scores in a distribution vary from the typical score.
What is dispersion analysis? ›Dispersion analysis is an evaluation of the predicted outcome from an incident and how it affects the surrounding equipment and people.
What are the two main types of dispersion? ›Individuals of a population can be spaced in different ways called dispersion patterns. In uniform dispersion, individuals are evenly spaced. In random dispersion, individuals are randomly arranged.
What are the main types of dispersion? ›
There are three types of dispersion: modal, chromatic, and material.
What is dispersion and what are the three types? ›The dispersion pattern (distribution pattern) of a population describes the arrangement of individuals within a habitat at a particular point in time, and broad categories of patterns are used to describe them. The three dispersion patterns are clumped, random, and uniform (figure 5.1. a).
What is the least reliable measure of dispersion? ›Range is defined as the difference between the highest(or largest ) and lowest(or smallest) observed value in a series. It is the most affected measures of dispersion by the extreme values of the series therefore it has the lowest degree of reliability.
What dispersion indicates? ›The measure of dispersion indicates the degree of spread or distribution of the data. This is only used for ordinal and interval scale data.
Which dispersion is best and why? ›The best measurement for dispersion is standard deviation. Standard Deviation helps to make comparison between variability of two or more sets of data, testing the significance of random samples and in regression and correlation analysis.
What are the benefits of dispersion? ›Dispersions incorporate better in rubber or plastic compounds, increasing uniformity and reducing scrap. Preweighed dispersions reduce worker exposure to chemicals. Dispersions reduce dust resulting in a cleaner plant environment.
What are the reasons for dispersion? ›Cause of Dispersion: When white light passes through a glass prism, its constituent colours (red, orange, yellow, green, blue, indigo, violet) travel with different speeds in the prism because refractive index is color dependent. This causes the dispersion of light.
What is considered a high standard deviation? ›The higher the CV, the higher the standard deviation relative to the mean. In general, a CV value greater than 1 is often considered high.
Which measure of dispersion is the most reliable Why? ›Standard deviation is the square root of the arithmetic mean of the squares of the deviations measured from the arithmetic mean of the data. It is considered as the best and most commonly used measure of dispersion to ensure high degree of reliability as it is a measure of average of deviations from the average.
What are the 4 types of measures of dispersion? ›Measures of Dispersion or Variability
They include the range, interquartile range, standard deviation and variance.
Is 50th percentile a measure of dispersion? ›
Percentiles and quartiles are not measures of dispersion but are measures of position, enabling us to determine the position of a particular data point within a data set.
Which range is the most reliable measure of dispersion? ›Answer: The range is a measure of dispersion. The range is calculated by the maximum value and the minimum value of a data set. The range is not the best measure of dispersion because it is not based on the entire data set.
Which measure of dispersion is not useful? ›Hence, Quartile is not the measure of dispersion.
What does a measure of dispersion tell us about a set of data? ›The measure of dispersion indicates the degree of spread or distribution of the data. This is only used for ordinal and interval scale data.
Which method of dispersion is best and why? ›- It is based on all values and thus, provides information about the complete series. ...
- It is independent of origin but not of scale.
There are five most commonly used measures of dispersion. These are range, variance, standard deviation, mean deviation, and quartile deviation. The most important use of measures of dispersion is that they help to get an understanding of the distribution of data.
What is the purpose of a measure of dispersion? ›The purpose of measures of dispersion is to find out how spread out the data values are on the number line. Another term for these statistics is measures of spread.
How many percent is standard deviation? ›The Empirical Rule or 68-95-99.7% Rule gives the approximate percentage of data that fall within one standard deviation (68%), two standard deviations (95%), and three standard deviations (99.7%) of the mean.
What does 50th to 75th percentile mean? ›50th Percentile - Also known as the Median. The median cuts the data set in half. Half of the answers lie below the median and half lie above the median. 75th Percentile - Also known as the third, or upper, quartile.
What is the most unreliable measure of dispersion? ›Range is defined as the difference between the highest(or largest ) and lowest(or smallest) observed value in a series. It is the most affected measures of dispersion by the extreme values of the series therefore it has the lowest degree of reliability.
How do you know which range has more dispersion? ›
Range is the simplest measure of dispersion. Determination of range is based on only two values in a data set ( highest value and lowest value) and is easy to be computed. A large range indicates a more dispersed data set about the mean while a small range exhibits a more clustered data about its mean.
What is the most robust measure of dispersion? ›Robust Estimators of the Dispersion
The standard deviation is the classical measure of the statistical dispersion, but it is not robust since it can be made arbitrarily large by a single outlier. The most common robust estimators of the dispersion are the median absolute deviation and the interquartile range.