Sharing Insights Through Our NLP Workshop
Throughout our work at Kai Analytics, we have frequently shared our knowledge through briefings, presentations, and workshops. During these events, we put our full efforts into helping workshop participants understand data, and the people it came from. Data has the power to help groups of people understand each other, which is often the goal of our analysis. Workshops can be a more practical way of achieving this goal than by reports alone, which may feel less immediate and clear to the layperson. The natural language processing (NLP) workshop we developed was more rewarding (even for myself) than I originally expected. There’s truth in the idea that teaching something is a great way to make sure that you truly understand it.
While we’re comfortable and experienced with many forms of data, how we handle and present clients' qualitative survey data gets and keeps their attention. Open-ended responses can provide a lot of insight if you know how to read them. That's part of why we use natural language processing (NLP) and recommend it to clients who want to learn more about their own options when analyzing data. Companies that reach out to us may not even be making the most of the data they already have. Seeing an opportunity to teach others how to glean more information from their open-ended responses is what lead us to start offering the Intro to NLP workshop.
From the start, we made decisions on the content of the workshop with our attendees in mind, even considering the format we present the workshop in. We made our programming language choices in part based on which libraries had good documentation and were intuitive to follow. The workshops use the Python notebooks format, which offer a good balance of ease-of-use while still making the details accessible if/when a user is interested. The Python notebooks format also helps avoid headaches by sharing a cloud-based Python environment, so attendees don’t need to set up their device beforehand.
Introducing Natural Language Processing
The workshop starts with an introduction to natural language processing (NLP). Even for those unfamiliar with programming, the concepts like parts of speech, prefixes and suffixes, and sentence structure are based in principles that you may be familiar with from English or another language class. Part of the advantage of using software for these aspects of NLP is the speed with which large sets of text can be processed, and the results tend to be pretty close to what would result from a more time-consuming manual analysis. This ‘preprocessing’ is, itself, informative, but also improves the more computationally intensive results that follow.
The topic estimation portion of NLP is more complex than the ‘preprocessing’ of comments. However, the idea is that each slice of text (commonly referred to in NLP as a ‘document’) will have a corresponding topic. The topic estimator tries to group documents based on the semantic similarity of the words within them. The estimator itself doesn’t have any ‘knowledge’ about how to interpret a topic, so there is still value in having a subject matter expert as part of the process to verify the results. Looking at the list of most common words within the topic, you can start to get a sense for how to describe the topic in plain English.
Results: Interpretation and Iteration
Interpreting the topics of open-ended response documents isn’t always easy, but I think that’s part of what makes this combination of NLP and human interpretation interesting. The machine learning models may not make the same connections as you or may group comments into themes that may not make sense at first. But any opportunity to look at your data from a fresh perspective is really valuable, which is why NLP is such a helpful tool to understand qualitative data. We can easily get exhausted reading over long texts, while machine learning can section off the main ideas for us in a clear way, even though it will do so imperfectly.
It’s also worth emphasizing that even with the same set of data, iterating over the process has benefits. For instance, most topic estimators have the user choose how many topics the analysis will identify as a parameter. With an unfamiliar dataset, this can feel like a guess. Don’t worry—there’s rarely a ‘right’ answer. If your initial results have an impractical number of topics or some aren’t distinct enough for you to interpret, you can lower the number of topics and re-estimate the data. Or make a note in your analysis that topics a and b are similar enough to be treated as the same.
While it may not be possible to convey every single aspect of NLP in our workshops, we do our best to explain all concepts as fully as possible and answer any questions. Ultimately, we see this workshop as an introduction to NLP as a tool, and a strong start to learning how to make the most of it in your own process.
Interesting in participating in a workshop to learn more about natural language processing (NLP)? Find out more here.
If you found this article useful, you might enjoy our newsletter. It’s a monthly email that keeps you up to date on what we’re up to and articles on topics we find interesting.