So, you’ve just finished collecting survey responses. Maybe you were measuring customer feedback on a new product. Or maybe you were surveying students on their course experience. Either way, you now have a whole ton of text data to sift through and find takeaways.
Now, it is tempting to just skim through your data and try to catch a general vibe, but that wouldn’t be very scientific. As data analysts, we need to make sure that the method we use for sorting our data is a rigorous process with a structure that can be changed to reflect new information. The most common method used to sort qualitative data is called coding your data. Coding is where you move through your data and assign labels to words or phrases that reflect certain themes. These themes can be general, like positive or negative feedback; or more specific, maybe this comment is referring to the quality of customer service or the demeanour of a university professor. Once you’ve labelled all your comments you can analyze your dataset by counting the number of times each theme appeared and comparing that to other comments, or segmenting by demographics.
There are two approaches to coding data. Manual and automatic. Manual coding is essentially what is described above, you read through each comment and manually assign labels to each one. Automatic is the same but using a computer to read and label the data. Automatic coding is a lot faster than manual, but manual is technically more accurate because of the extra control. Since both have their merits, we’ll take the time to explain both.
Manual Qualitative Data Analysis
Manual qualitative data analysis is a bit more involved, as it requires the analyst to read through each comment and decide what it’s about.
For this example, I’m going to use a sample dataset of answers to the question “What does college success mean to you”. This survey had 5 questions and nearly 2,500 responses. First, I’m going to change this to an Excel document so I can add columns and manipulate cells. Since I’m going to want to move this data around, I’ll add a sentence ID to act as a unique identifier. This will be especially important if you want to run code on your finished dataset.
Next, I’m going to duplicate this sheet. I can only label one question at a time, so I’m going to remove the other 4 questions. I’m also going to insert a couple of columns between the answers and the demographic data to make room for my topics.
Now that I’m all set up, it’s time to start coding responses. To do this, I’m going to read through each comment and decide what I think it’s about. This will require me to have some understanding of what the survey respondents are talking about.
Choosing topics is more of an art than a science, and they will evolve as you move through the dataset. In the first couple of comments, you’ll probably be adding lots of new topics, and at a certain point, you’ll mostly be matching comments to existing topics.
It’s important to remember that there is no limit to the number of topics you can have! Depending on the survey question, you might have respondents who are talking about lots of different things, although ideally, you want to design questions that don’t have this effect. You may find that some comments are all over the place, and it’s tough to pin down exactly what they’re trying to say. A handy tip from our analysis department, if it’s less than 5% of the comment, it’s probably not a topic.
As I’m reading through my comments, I’ll enter a 1 if the comment talks about a topic, and a 0 if it doesn’t. Or you could enter TRUE or FALSE statements. Either will work, but you’ll want to put something in each cell, or you’ll create null values when you export.
Automatic Qualitative Data Analysis
Automatic qualitative data analysis leverages an area of computer science called Natural Language Processing to break down text-based data into a format that computers can understand and analyze. It’s a very interesting topic at the forefront of AI and machine learning and has the potential to change the way we understand our world.
All of which is pretty incredible, but it’s not what we’re going to do here. We’re going to use NLP to quickly parse, process, and code our qualitative dataset so that it’s ready for some basic analysis.
I’m going to be using the same dataset as I did for manual analysis. For this example, I’m choosing to use Unigrams because it’s simple to use and understand without sacrificing core functionality.
The first step is to make sure your dataset is in .csv format. This should be an option when you export your dataset, but if not, you can save an excel file in .csv.
You’ll find that in an automatic approach you won’t actually spend very much of your time coding responses. This is because Unigrams does all of this in what’s called a “pre-processing” phase, where the computer breaks down each response and stores it so that you can get to the analysis.
In this first step, I’m selecting the country I want my data processed and stored in. This is important for protecting the privacy of survey respondents, so check your department's data compliance policies. Then I’m choosing the sample dataset from my downloads. Now click “Process”, and we’re done.
Automatic coding is very fast, but it does require that the analyst look at each theme and use some know-how to decide what it’s about. Computers can tell when words and phrases are related to each other, but they aren’t very good at knowing why.
Coding qualitative data is a generally accepted way of sorting qualitative responses into different themes, which can be measured in analysis. Coding can be done automatically with a program like Unigrams or manually in Excel. It is a critical step in qualitative data analysis and paves the way for deeper analysis later in the data pipeline. In our next blog we will show you how to turn your tagged data into visually engaging graphs using Excel -- stay tuned!
If qualitative data is something that interests you, consider signing up for our newsletter at the bottom of this page! It’s a monthly bulletin where we discuss important topics in the industry and keep you up to date with what’s happening at Kai Analytics.
If you want to learn more about what Kai Analytics can do for you, schedule a free consultation. In 30 minutes we can discuss your needs and see what we can do to help.