Google has released its annual Year in Search results, revealing the top searches for users around the world and in South Africa. The search…
Big data might have a bit of a problem on its hands. This may come as distressing to some since big data is often looked at as the big problem solver. Many businesses and organizations have looked to big data analytics to tackle problems they come across, helping them to improve over time. While big data can certainly be helpful in the solutions it provides, that doesn’t mean it’s perfect. Far from it, in fact, and as people become more familiar with it, they’re also coming across some possible limitations in what it can do.
Even more worrisome are the warnings of big data’s discrimination problem in that it perpetuates certain biases that are present in modern society. The aspect of big data’s imperfection may seem unpleasant, but only once we find its shortcomings can we correct them, turning big data into the problem solver we all want it to be.
Of course, the very idea that big data could be discriminatory may strike some as an absurd notion. Big data is, after all, just data, right? Facts, figures, statistics, and numbers are just like math. They simply reflect truth and reality, so how could big data be biased?
That thinking may hold true for some fields, but big data isn’t just about indisputable facts. Big data and the algorithms used to analyze it are only as good as the information being fed into it, and at its core, those numbers come from humans. In fact, much of the work revolving around big data is build by human minds, and as we all know, humans don’t always act without their own biases, whether intentional or not.
Big data is, after all, just data, right?
Now some may claim that more advanced analytics techniques like machine learning can bypass any possible big data discrimination, but even machine learning algorithms can learn in all the wrong ways.
One example many experts like to repeat is using big data algorithms in the hiring process. If employers tend to favor younger applicants, algorithms will pick up on this trend, eventually weeding out older applicants on its own. The results will be fewer older prospective employees no matter how qualified they are, which wouldn’t accurately reflect the entire applicant pool. This lone example should give people an idea of where big data can go wrong and biases can be introduced.
Research from data experts and scientists have shown big data discrimination to be an unfortunate reality.
One particularly notable study showed the effects of these types of biases in Google’s ad algorithm. The research showed that when names most associated with black people were searched in Google, the chances of being shown an ad for arrest records would increase by nearly 20%. Google would go on to correct the problem, but details over how the company did so are scarce.
It’s important to note that much — if not all of the bias and discrimination found in big data — is unintentional. Data analysts can only work with the data they have, but as can be seen, that data doesn’t always show a true reflection of the world.
Knowing big data’s limitations may actually end up empowering it even further than before
For example, many companies analyze data taken from smartphones, but certain segments of the population may either not have smartphones or use mobile devices rarely. That would mean a company’s analysis of mobile device data wouldn’t include poorer parts of the population. As another example, many questions have been raised over the use of big data with crime prediction technology. Such data may over represent certain parts of society, unfairly targeting them for potential crimes.
If the data used in such algorithms isn’t corrected for these biases, big data could be accused of being discriminatory.
Bias and other similar challenges with big data have even lead the FTC to issue a warning about its unintentional use that could lead to discriminatory practices. In just the past few years, it has turned into a serious problem that many analysts, engineers, and policy experts have tried to address. The same FTC warning talks about possible solutions to the problem, including greater algorithm transparency and privacy regulations geared more for our day.
With better use of other advancing technologies like the cloud and a flash storage array strategy, businesses and other organizations will be better prepared to account for possible biases. Anyone who works with big data should understand that the data they collect may not always shows the whole picture or be completely accurate. Knowing big data’s limitations may actually end up empowering it even further than before.
Feature image: Nicolas Raymond via Flickr