Phil Hearn: Blogger, Writer & Founder of MRDC Software Ltd.
Is it OK to weight my survey data?
You have successfully collected your survey data but the sample you have ended up with is not representative. “Should I weight my survey data?”, you might ask yourself. You might have 60% males and 40% females whereas you know you want a 50-50 split. The question you have to ask yourself is “is it OK to weight my survey data?”
What is weighting?
Weighting is a technique used to adjust for sampling errors in questionnaire or survey data (or any other respondent based data). It means that even though you did not get a representative sample of a type of respondent – for example, young females, you can apply a weighting factor to scale each respondent up and down as appropriate. In other words, each respondent doesn’t count as one respondent, they might count as 0.8 respondents or 1.3 respondents, for example.
Whether it is OK to weight data depends
The answer is that it depends. I was drawn to a good, short blog post by Peanut Labs which claims that you should never weight survey data, although, to be fair, in its summary it gets more in line with my opinion that you should not weight to adjust or correct for missing data and you should only weight for oversampling not undersampling. I would add that you need to be careful if the range of weights is too large. If some respondents are getting weights of, say, 5 and others are getting weights of 0.5, there is likely to be something seriously wrong with your data.
Effective sample size tells you a lot
Some software packages like MRDCL, QPSMR and SPSS contain tools to show you the effective sample size. This is valuable information. If you have a sample size of 500 respondents and after weighting the effective sample size is 490, you can be confident that your results are more or less as valid as if you had a balanced sample 500. If the effective sample size is 270, it means that your data would have been just as robust if you had sampled 270 people and not needed to weight the data.
Dangers of weighting
One of the dangers of weighting data is that most software packages will calculate weights and apply the weight s to results, tables, charts etc. without telling you about the effect on your data. A computer program may well calculate results correctly applying extreme weights, but it is the responsibility of the data provider to dig deeper and be sure of the effects of the weighting.
The golden rule of weighting
If there is a golden rule, it is to always check the weights that are being applied and the effective sample size – well that’s two rules, of course!
What types of weighing are there?
There are two types of weighting – target weighting and rim weighting. There is a third type known as factor weighting by this is usually the application of pre-calculated target or rim weighting.
Target weighting (weighting type 1)
Target weighting may either be weighting to one simple set of targets or a number of interlocking cells. In other words, you may weight to one variable such as gender setting targets of, say, 50% males and 50% females. Or, you may wish to weight to targets of two or more variables. For example, you might weight to age within gender, such that you have targets of 20% of Males Under 25, 20% for Males 25-45, 10% for Males Over 45, 15% for Females Under 25, 20% for Females 25-45 and 15% for Females Over 45. The percentages should always add to 100%. They can be set as targets to actual numbers to represent populations, e.g. 2500 Males Under 25. If there are three variables, you would need percentages or figures for age within gender within region.
Target weighting factors
Target weighting factors are calculated by dividing the target by the actual percentage or figure. So, if you had a target of 20% Males Under 25 but your sample contained 25% Males Under 25, you would apply a weight of 20/25 (0.8) to each Male Under 25.
Be careful not to have too many target weighting cells
The number of cells in a target weighting matrix will be the product of the number of categories in each variable. If you have two genders, three age groups and five regions, this will mean that you have 30 cells (2 * 3 * 5 cells) in your weighting matrix. This may work fine with a sample size of 2000 where the average number of respondents per cell will possibly be around 50-100, but this would not be appropriate if you had 100 respondents. When you have small samples, particular care is needed.
Rim weighting (weighting type 2)
Rim weighting calculates are weighting factor for each respondent but does not use interlocking cells. You could weight to gender, age and region and could have independent targets for each of those three variables – for example (50% Males, 50% Females, 30% Under 25, 40% 25-45, 30% Over 45, 40% North, 60% South). Here you have targets for each variable, but not for Males Under 25 in the North. Rim weighting works by an iterative whereby it weights respondents to one variable first of all, then the second variable and finally the third variable (in this example) applying the product of the previous weights calculated as it goes. Then, it reiterates this process until weights of 1.0 are part of the product of weights applied. This is a more complex process that is explained more fully in this blog article.
Two types of missing data
There are two things you cannot do – and this might sound obvious but it is easily forgotten. You cannot nobody to any target. If you are trying to weight Males Under 25 in the North to 5%, you can do anything if there are no Males Under 25 in the North in your data. You will need to merge cells – or better still, collect some more data. Secondly, you will need to consider what you do if any of your respondents have missing data that is used in the weighting calculation. For example, if a respondent has not given their age, you cannot weight someone with no age to a target that needs an age. You may have to exclude this record.
Using the right software
It is important to use software that can apply weighting both correctly and have tools to warn you if the results you are producing are unreliable. All of our software products adhere to good standards so that you should not make this mistake. If you wish to understand more fully, we have a free rim weighting calculator which comes with full explanations of how rim weighting works. If you need help, advice or software to weight your data, please contact me and I will try to advise you.