Not too long ago, the march of Big Data seemed totally unstoppable. According to industry experts and futurist visionaries, the stage was set for life to become dominated by mass data collection, hyper-powered algorithmic analysis, and tools to convert data into ever more efficient systems.
With the 2008 Financial Crisis fresh in memories around the world, Big Data excited the imagination of corporations, marketers, academics – even artists. And for some people, it had revolutionary potential. As Peter Sondergaard of Gartner has said, “Information is the oil of the 21st century, and analytics is the combustion engine.”
But what if the engine has stalled? And what if VPNs (Virtual Private Networks) are to blame?
As this article will argue, there is plenty of evidence to suggest that the spread of VPNs is posing serious problems for data collection and analysis. As individuals seek to safeguard their privacy, collection techniques used by data gatherers aren’t as easy to deploy, leading to some serious technical, legal, and political problems. And the way these problems are resolved will have major implications for the future of digital society.
A quick history of how Big Data and VPNs came into conflict
Both Big Data and VPNs are relatively new phenomena – at least as far as everyday computer and smartphone users are concerned. And the way they interact is changing the landscape of the data collection industry.
The term “Big Data” refers to the collection and analysis of massive data sets, which go well beyond standard software-based methods. With the rise of internet use around the world and the development of tracking tools, alongside the spread of the Internet of Things (IoT), companies have found a way to create streams of data that dwarf their predecessors.
These streams have opened up new possibilities, from tracking the health of individuals, to preventing crime, fighting malaria, ensuring that famine victims are adequately provided for, and analyzing climate change. So it’s easy to see why governments and companies have been enthusiastic about Big Data. Information is power, after all, and with these analytical tools, that power is growing all the time.
However, bulk data collection has coincided with the growth of fears about invasions of privacy, and corporate or government surveillance. The rise of Big Data coincided with revelations about NSA spying on the USA, attempts by copyright holders to aggressively enforce their digital rights, and a host of allegations of abuses by tech giants like Google and Facebook.
These concerns have fuelled the rapid growth of VPN usage, with the market for VPNs projected to more than double between 2016 and 2022. Stimulated by the dangers of surveillance, cyber-crime, and geo-blocking, millions of people discover the advantages of VPNs every month. As people have begun to encrypt their data and hide behind anonymized IP addresses, the consequences for Big Data have started to become clear, and there could be a serious slow-down on the horizon.
And that’s a very worrying trend. While NSA surveillance and Facebook’s tricks have rightly alarmed web users, the benefits of Big Data collection are clear to see. So let’s look in more detail about how data collection is being compromised, and what could be the way forward.
How VPNs are slowing down the growth of Big Data
There are various ways that VPNs act against the interests of data collection, and when put together, they pose a formidable barrier to Big Data’s future.
Perhaps most importantly, VPNs sever the link between customers and their ISP. Previously, Internet Service Providers (in most countries) have been able to track user behavior via DNS requests, anonymize this data, and then monetize it by selling it to third party clients. VPNs effectively make this impossible, by shielding the user’s DNS requests. If the VPN is rock solid, then ISPs can’t tell what sites people are viewing, denting the value of the data they collect.
At the same time, good VPNs provide a wall of encryption around a user’s internet connection, creating a further barrier between external actors and individuals. This has stymied many invasive tracking tools that have been used in the past, effectively shutting out a chunk of the web population from Big Data collection.
That might not seem like a big deal on the face of things. VPN usage amounts to at most 25% of the web using population right now (although that number is growing fast). But this is a significant section of those who use the web. It tends to include a high proportion of technically literate users, whose data is extremely important for building marketing profiles. And losing a significant chunk of the web-using public weakens the quality of data sets considerably, which thrive on data volume and diversity.
This doesn’t mean that VPNs have decimated the Big Data industry, or that VPNs act alone. For instance, the European Union’s GDPR regulations have been a setback for data collection, while increasing knowledge among smartphone users about granting app permissions is also having an effect. So what is the solution?
The future of VPNs and Big Data: Can they be reconciled?
Ideally, the future would be bright for both digital privacy tools and Big Data analysis. In a complex world, we need ways to protect individual rights from overweening state and corporate power. But we also need ways to understand the world we live in, and to formulate policies that allow us to solve social or environmental problems.
From the Big Data side, coders, analysts, and organizations that gather data should be aware that VPNs aren’t going away. Smart organizations will plan for 50% VPN usage by 2022, and assume that encryption and DNS protection is standard by then.
This doesn’t mean that data collection will be off-limits. But what it does mean is that Big Data companies need to be more open with customers, and more pro-active in meeting high ethical standards. Any more Cambridge Analytica-style scandals are likely to push lawmakers and consumers towards restrictions or privacy tools, making data collection even harder.
In some cases, VPNs can work with data collection systems to provide anonymized bulk data, and that’s one way around the conundrum. But these VPNs will alienate more privacy-aware users, which will weaken the quality of the data sets provided. And some services will provide in-depth customization options to toggle what data is provided and to whom.
However, the clash between privacy and data collection isn’t an easy one to resolve. Data analysts will need to be more creative about how they source data (from web searches, mobile devices, physical sensors, and tracking software). But the truth is that, with VPN technology becoming mainstream, large amounts of web activity will be off-limits.
This might be a good thing for the data processing sector. Big Data could be on the cusp of becoming more subtle, flexible, and responsive, and VPNs could be one stimulus towards making the shift. If bulk web data collection isn’t possible – at least not in an unrestricted way – companies will need to focus their efforts on prescriptive analytics, and data sets that suit particular tasks.
Or we could just see a continuing tangle between privacy and bulk data collection. One thing’s for certain, the hunger for data and the desire for privacy aren’t going anywhere, and the way they relate is going to be a key dynamic in the years ahead.