May 17, 2023

Tackling the challenge of non-English data

Date & Time (GMT):
May 17, 2023 12:47 PM
Date & Time (EST):
May 17, 2023 12:47 PM

A big challenge international brands face when it comes to social listening, is how to accurately analyse non-English social data. Particularly if you’re dependent on social listening tools.

During the 2021 Tech Demo Day, we delved into this topic in more detail with Jackie Cuyvers, CEO of Convosphere; Michele Aggiato, Head of Community at Vorwerk Group; and Sathyaraj Aasaithambi, Senior Manager at Novartis. We looked specifically into the challenges of working with non-English data and how to tackle them with the help of local analysts.

Demo Day 2021 Expert panel discussing non-English social data

How are international brands and agencies using non-English social data?

Problems with non-English data: the what and the why

The first area where problems with non-English data usually starts is with the stakeholders commissioning the work or projects. According to Jackie, problems occur when “their understanding of the social channels or where the data’s coming from is limited to specific channels…” They may also assume that ‘global English’ is good enough for a project across Europe. So the expectations of stakeholders themselves play a crucial role.

Sathya added some industry perspective to this, bringing up the challenges of signal to noise ratio and the snowball effect. A few odd topics trend for a while and get extremely high visibility during that time. According to Sathya, “It doesn’t mean those are the topics that need to be heard. There could be a lot of unmet needs and the voices that we really need to watch out for…”

When there’s too much to listen to and too many trending topics adding to the noise, it’s difficult to figure out exactly what data is important to us.

Problems also arise when each market is working independently. This creates data silos. As someone working in a highly decentralised organisation, Michele explained how this was the exact issue they experienced. “All these bits were never coming together to create a wider picture and (give a better understanding) across multiple languages,” he elaborated.

The problem with relying on tools for translation

A major challenge with non-English data is the translation itself, especially considering the limitations of tools in this area. According to Michele, relying on a tool for social listening often limits you to the specific market in which the tool is particularly strong.

So for example, U.S-based tools are great at reading English conversations. But that doesn’t necessarily mean they can also deliver useful data for the French or the Italian market.

Jackie added to Michele’s point saying, “(These tools) don’t all handle language equally well and they don’t all have the same depth of data in those different markets.” In many cases, some of the information gets lost in translation because they have to go through the English “middle layer” first.

The need for local analysts to bring cultural context

According to Sathya, even though there are multiple platforms that can offer translations in multiple languages, that’s not where the process ends. “The reality is it’s not just about translation, it’s also about how you process the data and ensure that it becomes more meaningful,” he explained.

Since you can’t rely on tools alone, there’s a need for local analysts to accurately process non-English data and turn it into something useful. “Google Translate doesn’t account for the way that we speak – where we use idioms and euphemisms or local slang,” Jackie explained.

In the U.S, for example, a euphemism for “dying” is “to kick the bucket.” But you won’t get accurate results if you were to localise that query using Google Translate. In Slovenia, the equivalent is “he went to whisper to crabs.” In Germany, it’s “to give your spoon away.” So only a local analyst can put context into the data with their understanding of the local and cultural nuances.

Besides this, communities and channels can have their own languages and terms that Google Translate or other tools can’t understand. In the health community, for instance, people with chronic pain call themselves “spoonies.” And Reddit forums have their own abbreviations that tools can’t account for.

Infographic showing how to use work with non-English social data

What to do with non-English social data  

Testing the language capabilities of different tools

While you can’t rely on tools alone, you still need them to support your analysts. When it comes to testing the language capabilities of different tools, Jackie recommends understanding your stakeholders and your business objectives. You need to know what you want to get out of the research and what markets are important to you.

So, for example, if your focus is on the U.S and European markets, you don’t have to worry about how a tool handles Asian languages. In this case, “you should start by testing out the tools and the data and the language handling first.”

And if you’re more interested in the most common languages to get a global perspective, any of the global enterprise tools can work for you. So it all depends on what your business needs.

Technology and tools are in the nascent stage in terms of converting data into more meaningful insights. That’s why there’s a need for a hybrid approach – one in which you have both tools and human analysts working together to collect, analyse, and translate the data. In short, social listening tools should be used to facilitate the analyst’s understanding of the data.

Or view the interview on LinkedIn

This interview was recorded via LinkedIn Live, if you prefer to view on LinkedIn, click the button below.

View Interview

See related content

Webinar & Panel
[Webinar] More Than a Soundbite: Storytelling Using Social Media Data
Webinar & Panel
A panel discussing Keeping the Social Data Tap On
Webinar & Panel
IPSOS Challenge: Finding social insight beyond Facebook, Twitter & Instagram