Comparisons

When is a difference meaningful?

Confidence intervals

NoteKey messages

• Confidence intervals are a measure of the statistical precision of an estimate and show the range of uncertainty (caused by sample size and random variation) around the figure. They are important to consider when interpreting data and comparing areas or trends to explore whether differences are statistically significant (i.e. very unlikely to be due to chance)

• It is also important to consider the practical significance of an effect.

Understanding if the difference between two values is meaningful is vital to ensuring robust policies however it can be challenging to define what ‘meaningful’ is. One definition of meaningful is that is that the size of difference has some practical effect. For instance, there is evidence that weight loss of ≥5% results in significant improvements in cardiovascular risk so a weight management intervention may use this as threshold for achieving a meaningful result. An alternative measure of meaningfulness is statistical significance. In this case meaningfulness is defined as how certain we are that there is actually a difference between the values given the impact of things like chance and sample size. For instance, if the weight management intervention was undertaken on a very large number of individuals we may be very certain of the average weight loss it results in to the point where we can say the difference is statistically significant.

Either form of significance is valid but in both cases the level at which differences are determined to be meaningful should be defined before measurement of differences are made to avoid bias. With increasing sample size it is more likely to get a result that is statistically significant, though this does not effect the practical significance.

Because practical significance is domain specific this page will focus on statistical significance. Statistical significance is typically measured by p-values or confidence intervals. P-values provide a single value suggesting whether a difference is significant (typically if the p-value is <0.05 ). Confidence intervals are an increasingly common alternative that give a range of where the true value likely lies within a range of values. In both cases both typically mean that if the analysis was repeated there is a 1 in 20 chance that a different result would be found (either a larger or smaller difference). Most analysis undertaken in Camden uses confidence intervals.

The width of a confidence interval is affected by three things:

  1. Sample size: If we are only looking at a sample of a population (e.g. a survey which we then extrapolate from to the whole population) there will be random differences between the sample of the population and the population as a whole. If we survey 1,000 people about smoking to estimate how many people in Camden smoke, irrespective of how well we design our survey, the group surveyed will differ from the population as a whole. We use confidence intervals to take account of these differences. The bigger the survey, generally the smaller the confidence intervals. Confidence intervals do not account for systematic bias in a survey (for example where a survey has been poorly designed or implemented).
  2. Stochastic processes: Often with health data we are working with a full data set rather than a sample; for example, we know exactly how many people have been diagnosed with diabetes. However, there is still random, natural variation (or stochastic processes) within the data. For example, imagine rolling a six-sided dice 60 times. We’d expect ‘4’ to come up 10 times in total but each roll has a 1-in-6 chance of producing a ‘4’. If it comes up 12 times we can use confidence intervals to account for the natural variation that occurs in rolling the die to decide whether it was just chance or if the die is loaded and there is a pattern to the data. To give a health related example if we look at mortality we might find that in one year 1,000 people died and in the next 1,010 died. When we look at those figures we want to know if that difference is a ‘real’ underlying pattern of increasing mortality or if, just by natural chance, 10 more people have died. We use confidence intervals to take account of this natural variation in the world so that we can try and see real underlying patterns in the data.
  3. Level of confidence required: When we calculate a confidence interval of 95% we can say that we are 95% confident that the true value lies within the range of our confidence interval. We can change the level of confidence; typically, 95% or 99.8% or 68% confidence levels are used. By increasing the level of confidence, wider confidence intervals will result; because we are more certain that the 99.8% interval contains the true value it will be wider than a 95% interval for the same data.

The interpretation of confidence intervals depends on whether they are being compared to a single point value or to another set of confidence intervals:

Confidence intervals compared to a single value

Benchmarking involves comparing the confidence intervals of one area against a point value (i.e. a value without confidence intervals) of the target or benchmark. If the confidence intervals overlap the point value there is not a statistically significant difference. If the point value is not contained within the interval then there is a significant difference. The benefit of this approach is that it allows an exact statistical test to be made. If the confidence interval overlaps with the data point, then we can be sure there is no significance difference. If they don’t overlap, we can be sure there is a significant difference.

When we are using a point value (such as 50% or an odds ratio of 1), there is no problem with this approach. However, if we are benchmarking against a data point with confidence intervals (such as another area or time period) we must assume that the data point is the true value, ignoring any confidence intervals. This means we are likely benchmarking against an incorrect figure. This could erroneously show a significant difference where there is none.

Confidence intervals compared to confidence intervals

The alternative approach can be used when we have two sets of data with confidence intervals and we want to take the imprecision of both into account. In this approach we compare the two sets of intervals to see if they overlap. If the intervals do not overlap there is a significant difference between the data. However unlike with the benchmarking technique we cannot be certain that when the intervals do overlap there is not a difference between the data (the reasons for this are beyond the scope of this document).

In this approach we compare to see if the upper confidence interval is lower than the lower of the other, in which case it is statistically significantly lower, or the lower confidence interval is higher than the upper of the other, in which case it is statistically significantly higher. This could erroneously not show a significant difference where there is one.

Back to top