Phil Hearn: Blogger, Writer & Founder of MRDC Software Ltd.
How to analyse hierarchical data in market research
There are two types of hierarchical data encountered in market research. These are respondent-based hierarchies and data-based hierarchies (sometimes called data loops). In practice, they are analysed similarly, but, more importantly, they need software capable of analysing hierarchically structured data.
Overview – hierarchical data
This blog article explains what hierarchical data is, how it is stored, what software you need to analyse it, and, finally, some solutions to the task.
Examples of respondent-based and data-based hierarchies
Respondent-based hierarchies
An excellent example of a respondent-based hierarchy would be a doctor/patient survey where you are surveying a doctor and some of her patients. In such a case, there would be two levels of data. There would be data relating to the doctor and each patient. This may be a fixed number of patients (e.g., the last five cases for each doctor) or a variable number (e.g., patients seen in the previous week with a particular illness). The doctor data might include the type of practice, the region the doctor worked in, year of qualification, attitudes to new techniques, etc. The patient data might consist of the person’s age, gender, the length of time he/she visited the practice, frequency of visiting the practice, etc.
Data-based hierarchies
An excellent example of a data-based hierarchy might be activities that someone does. If you are conducting a survey about eating-out behaviour, you will likely have respondent data and, perhaps, occasion-based data. For example, the respondent data would contain details of the respondent’s age, gender, income, preferred food types, etc. There would then be occasion-based data for each eating-out occasion (kind of restaurant, the amount spent, etc.).
Data loops
In questionnaire design terms, data-based hierarchies are commonly referred to as data loops. The term ‘data loops’ mainly refers to online surveys where respondents will be asked to ‘loop’ through one or more questions. Data loops are encountered where questionnaires seek to collect data about trips, eating-out occasions, TV viewing timeslots, items purchased, etc., where more than one loop may be necessary. Using trips as an example, sometimes details of all trips are collected; sometimes, data from a fixed number of trips, e.g., the last five trips, are collected.
Mixed respondent and data-based hierarchies
There are occasions where both types are present, such as the three-level hierarchy of doctors, patients and drugs prescribed hierarchy. The doctor/patient data would be a standard respondent-patient-based hierarchy, but each patient might have any number of prescribed drugs, each with different dosages, regimens and frequency, for example.
Understanding hierarchies
In practice, both types of hierarchy are the same. Data from the higher level applies to all data at the lower level(s) in the hierarchy. Each patient for a specific doctor will ‘inherit’ the attributes of that doctor – the region in which the doctor works, his specialty, his attitudes to techniques, etc. The same is true for data-based hierarchies. The data relating to a respondent will be applicable for each eating-out occasion for a specific respondent. On the other hand, each eating-out event is independent of every other eating-out event. The difference may be that respondent-based hierarchical data may be stored as a series of records or in two or more data files, whereas data-based hierarchical data may be embedded in a single record, though this is not always true.
How survey tools store hierarchical data
How hierarchical data of any type is stored may differ between survey tools. There are four main methods, although some software products may have proprietary methods. These are:
- Flat files – Many survey analysis tools store all data as flat files. This means repeated sections of the questionnaire are stored in blocks of fields. For example, if up to five eating-out occasions are recorded, fields 10-19 may store data about the first eating-out occasion, 20-29 may be used for the second occasion, 30-39 for the third and so on. This generally works satisfactorily, except where there is a variable (and potentially large) number of ‘occasions’ or repeated data blocks. For example, where you want to record data for all business trips in a year, potentially, space will need to be allocated for many trips.
- Record-type structured files -Some data analysis systems store data similarly in flat files but have multiple records within each respondent record, usually interleaved. For example, a record with two doctors, one with three patients and the other with two patients, may appear as Doctor 1/Type 1, Doctor 1/Type 2, Doctor 1/Type 2, Doctor 1/Type 2, Doctor 2/Type 1, Doctor 2/Type 2, Doctor 2/Type 2 where Type 1 is the doctor record and Type 2 is a patient record.
- Multiple file types – Some software packages will store data types in different files with a serial number identifier linking the data from the files. File 1 may contain doctor records, File 2 may have patient records, and File 3 may have drug-prescribed records. Each file is linked by an identifying serial number or ID.
- SQL databases or other databases – are similar in concept to ‘Multiple file types’ except that all the data is in one database file with a table for each type. How the tables connect is encoded in the database structure. Programs using this technique usually have an export to one of the previous methods, as the structure may be unclear or unintuitive. Alternatively, APIs like TSAPI can make connection possible. This type of database may limit the ability to use other software.
- Proprietary methods – An export or use of an API will be necessary otherwise it will limit the ability to use other software.
Research software packages that have limited tools for handling hierarchical data
Most survey analysis packages do not allow you to analyse hierarchical data easily. They work on the principle that there is one record for each respondent. Some software packages can produce tables based on occasions, for example. However, it may be a laborious task. If there are up to 10 eating-out occasions, you may need to add data from 10 variables to produce the one table you want based on total eating-out occasions. If this principle needs to be applied to many questions, this can then become a lengthy process. Snap and QPSMR have similar capabilities in this area and tools to manage smaller or simpler hierarchies.
Problems with processing data in software with limited tools
Although producing tables from occasion-based data may be possible, there may still be a problem. Let’s say you want to find out what percentage of the eating occasions for each respondent was in a fast food restaurant. This would mean you need to sum up the number of occasions in a fast-food restaurant and divide it by the number of occasions in total to calculate the percentage. The number of occasions would vary from respondent to respondent, so a calculation would need to be performed. At this point, many software tools struggle. Similarly, if you want to sum the amount spent across all eating-out occasions, this may be difficult or impossible. It may be possible to output data to Excel, for example, make calculations and paste or import it back to the primary data file. However, this starts to become time-consuming, especially where there are a lot of variables. Also, it means introducing a step that is prone to error and generally cumbersome.
The video below shows you how MRDCL handles hierarchical data.
If this video helped you here is another one that handles repetitive blocks of data.
Software that processes hierarchical data efficiently
Very few software products have the power to process hierarchical data flexibly and efficiently. MRDCL handles the three main methods data collection platforms use to store hierarchical data. In practice, this means that to process 20 eating-out occasions per respondent, it is not approximately 20 times as much work to produce an occasion-based table as a standard table. Similarly, suppose you want to calculate information by reading the hierarchical data as a set. In that case, this should be a simple process and should not require recoding, data exports and imports, or other complexities.
Is MRDCL the only solution for this type of analysis?
MRDCL is not the only solution, but it is one of the very few packages that can handle this type of task well. It tends to the more established products like MRDCL and Quantum that are needed for such tasks. Or, at least, to be handled efficiently.
Is there a way to simplify tables from a hierarchical survey?
MRDCL offers another unique solution, allowing researchers and analysts to handle tabulations easily. The skilled part of the process is reading and processing the data. However, MRDCL enables you to process data and then provide the data for analysis in Resolve, a free software product that understands hierarchies and data loops, allowing you to produce tables easily.
Talk to us if you have surveys with hierarchical data or data loops. We can simplify your data processing and help you to make analysis simple. Contact nikki.sunga@mrdcsoftware for more information.