The truth about social data access
As the social listening landscape continues to evolve, one of the biggest questions that comes up is around social data access. Social data falls under two categories. “Licensed data,” and “crawled data”. The differences between the two can cause a lot of confusion about the kind of social data that’s available via different tools, what you should and shouldn’t access, and more.
During the SI Tech Demo Day 2021, we explored this topic further with Noam Cadouri, GBM of Reddit's Data Partner Program and Kaylin Linke, Head of Solutions Architecture at Socialgist. We looked specifically at Reddit as a data source for this session.
Licensed vs. crawled data: Risks and advantages
To put the conversation into context, we need to explore the differences between licensed and crawled data.
Crawled data, as Kaylin explains, involves “accessing what’s out there on the public web.” Think of it as Google giving you access to all the websites they’re indexing through search engine results.
On the other hand, licensed data involves the data that you get through an agreement with the data owner (in this case, Reddit). The terms of the license dictate all the nuances such as how the data is delivered, controlled, and maintained. “We’re making sure that the folks on the other end of the data are displaying it and using it in a way that is respectful of the Reddit community and in line with the guidelines that Reddit has in place,” Kaylin says.
There are plenty of advantages to using licensed data. One is that it’s more actionable. The sort of information you have access to, means it’s easier to get insights that will inform strategic business decisions.
Licensed data also has advantages from a compliance perspective. Since there’s an explicit and formal agreement with the data owner, licensed data is safer and less risky to handle. This also gives you more control over how you access the data.
When it comes to crawled data, however, there are situations where we could lose access to a specific site. The site could change, it could stop working, or it could stop existing altogether. So there’s no control over the data sources.
Licensed data also beats crawled data in terms of speed. You get real-time access to the data, so “there isn’t the latency that is sometimes associated with more of the crawled sources.”
[caption id="attachment_6547" align="aligncenter" width="2560"]
Understanding the differences between licensed data and crawled data[/caption]
Reddit as a data source: Why it makes so much sense
Since the focus of the discussion was on Reddit as a data source, Noam also shared some insights into what makes the platform so special. In terms of numbers, Reddit currently has more than 52 million daily active users, 100,000+ communities, and 50 billion views.
But that’s not the only reason why Reddit’s special. Rather, it’s the unique offering of online discussion communities. These communities are where users find topics that they’re interested in, valuable information that they need, and positive emotional support. It offers a sense of belonging that users often seek.
Anonymity is another unique offering that major social networks don’t provide. Since Reddit doesn’t ask users for their real names or emails, it allows users to be their authentic selves and share their unfiltered thoughts on the platform.
On top of that, people use Reddit not just as a place to share their fleeting thoughts but to engage in more in-depth conversations. So the data that you can get is richer, more valuable. All of this sets Reddit apart from other social media platforms in the eyes of both users and analysts.
Data access on Reddit: How it works
The good news for users is that Reddit doesn’t provide user information such as demographics data to social listening tools. According to Noam, “When they come to us and ask for anonymity, we want to do our best to honour that.” It only syndicates the data that’s publicly available.
While social listening tools can access Reddit data, they can only do so through Socialgist, the official data partner. This gives more structure around how the data is accessed and used. Socialgist helps to ensure that the end user of the data complies with the Reddit guidelines.
They also offer different delivery options depending on what you need. They can provide a full stream of the data in real time to social listening tools. But for more niche players, they can also set up different rules to deliver very specific data.
There’s no doubt that Reddit is a rich source of data, where users can be their uninhibited selves and engage with hundreds of thousands of communities. And with the Socialgist partnership, social listening tools and marketers alike can gain access to this data in a way that makes the most sense for their specific needs.
Find out more about SI Tech Demo Day and access all the presentations, including this discussion, on demand.