How to analyse hierarchical data in market research

There are two types of hierarchical data encountered in market research. These are respondent based hierarchies and data based hierarchies. In practice, they are analysed in similar ways, but, more importantly, they need software that is capable of analysing hierarchically structured data.


This blog article explains what hierarchical data is, what software you need to analyse such data and, finally, some solutions to the task.

Examples of respondent-based and data-based hierarchies

Respondent-based hierarchies

A good example of a respondent-based hierarchy would be a doctor/patient survey where you are surveying a doctor and some of his patients. In such a case, there would be two levels of data. There would be data relating to the doctor and a variable number of data records relating to each patient. For example, doctor data might include the type of practice, the region in which the doctor worked, attitudes to new techniques etc. The patient data might include the person’s age, gender and the length of time he/she had been visiting the practice, the frequency of visiting the practice etc.

Data-based hierarchies

A good example of a data-based hierarchy might be activities that someone does. If you are conducting a survey of someone’s eating out behaviour, you are likely to have respondent data and, perhaps, occasion based data. For example, the respondent data would contain details of the respondent’s age, gender, income etc. There would then be occasion based data for each eating out occasion.

Mixed respondent and data-based hierarchies

There are occasions where both types are present, such as the three-level hierarchy of doctor, patients and drugs prescribed hierarchy. The doctor/patient data would be a standard respondent-based hierarchy, but each patient might have any number of drugs that are prescribed, each with different dosages, regimens and frequency, for example.

Understanding hierarchies

In practice, both types of hierarchy are the same. Data from the higher level is applicable to the lower level in the hierarchy, yet the reverse is not true. Each patient for a specific doctor will gain the attributes of that doctor – the region the doctor works, his specialty, his attitudes to techniques etc. The same is true for data-based hierarchies. For each eating out occasion for a specific respondent, the data relating to a respondent will be applicable. On the other hand, each eating out occasion is independent of other eating out occasions. The difference may be that respondent-based hierarchical data may be stored as a series records or in two or more data files whereas data-based hierarchical data may be embedded in a single record, though this is not always true.

What tools are available to produce tables from hierarchical data?

There are three main options – using Microsoft Office products, using research software that mainly handles one respondent per record data and data that has full functionality for processing and tabulating hierarchical data.

Producing tables from hierarchical data using Microsoft Office

Most survey analysis packages do not allow you to analyse hierarchical data. They work on the principle that there is one record for each respondent. Whilst Excel cannot help you to analyse hierarchical data unless you program Excel using VBA or recode data to multiple worksheets, Microsoft Access does understand data hierarchies. Access refers to hierarchical data as one-to-many relationships. Whilst it can manage the data, it will have limited capabilities to perform tabular analysis, particularly if it is complex. Again, VBA or recoding make this possible albeit cumbersome. Hierarchies mainly exist in Access to manage reports rather than tables.

Research software packages that have limited tools for handling hierarchical data

Some software packages have the capability to produce tables based on occasions, for example. However, it may be a laborious task. If there are, for example, up to 10 eating out occasions, you may need to add data from 10 variables together to produce the one table that you want based on all eating out occasions. If this principle needs to be applied to many tables, this can then become a lengthy process. Snap and QPSMR are similar in their capabilities in this area and have tools to manage smaller or simpler hierarchies.

Problems with processing data in software with limited tools

There may still be a problem though if you wish to process data and apply calculations to a higher level in the data. What does that mean? For example, let’s say you want to find out what percentage of the eating occasions for each respondent were in a fast food restaurant. This would mean that you need to sum the total occasions in a fast food restaurant and divide it by the number of occasions in total. The number of occasions would vary from respondent to respondent, so a calculation would have to be performed. At this point, many software tools struggle. It may be possible to output data to Excel, for example, make calculations and paste or import the data back to the main data file. However, this starts to become time consuming especially where there are a lot of variables as well as being prone to error and generally cumbersome.

Software that processes hierarchical data efficiently

You are left with very few software products that can manage hierarchical data by processing it efficiently. By efficiently, this would mean that the software would need to have the capability to read repetitive records or blocks of data without having to repeat specifications. Specifically, this means that to process, say, up to 20 eating out occasions per respondent, it means that it is approximately 20 times as much work to produce an occasion based table as a standard table. Similarly, if you want to calculate information by reading the hierarchical data as a set, this should be a simple process and not require recoding, data exports and imports or other complexities. Specifically, if you wanted to get the total cost of all eating out occasions, this should be simple or if you wanted to calculate the percentage of eating out occasions that are at fast food restaurants, this should be a simple task.

Is MRDCL the only solution for this type of analysis?

MRDCL is not the only solution, but it is one of very few packages that can handle this type of task well. It tends to the more established products like MRDCL, Quantum and Merlin that are needed for such tasks. Or, at least, if it is to be handled efficiently.

Is there a way to simplify tables from a hierarchical survey?

MRDCL offers a unique solution for allowing researchers and analysts to handle tabulations easily. The skilled part of the process is managing the data and it requires a more advanced product like MRDCL, Quantum or Merlin. However, MRDCL allows you to process data and then provide the data for analysis in Resolve, which is a free software product that understands hierarchies and allows you to produce tables.

This means that you can buy services from MRDC Software or any other user of MRDCL and then produce tabulations yourself using easy to use interactive tabulation software. It means that you are splitting the skilled parts of the data processing from the less skilled parts and means that you can produce as many tables as you wish easily.

If you want information or advice, please contact me. I will be pleased to advise and help.