“The past doesn’t repeat itself, but it rhymes”, reads the famous Mark Twain quote in the introduction of a research paper suitably titled Mining the Web to Predict Future Events.
The mining of massive amounts of data allows a party to potentially adjust (and ultimately manipulate) its relationships found on the web via social media, media archives and seemingly countless other resources. Resources are gathered to the extent that it is, in most regards, theoretically possible to predict future events.
In a previous article, I discussed examples of how big data is being used in medical fields such as the initiatives pursued by FutureMed, and the marketing fields, where examples include Wal-Mart or more suitably, Amazon. It’s said that the world’s largest online retailer, Amazon, makes more than a third of its sales based on its customer recommendations system.
Natural disasters and epidemiology
As Eric Horvitz of Microsoft Research and Kira Radinsky of the Technion-Israel show us in Mining the Web to Predict Future Events — it’s possible to make predictions ranging from flu outbreaks, to finances and even critical information regarding national security. The above example shows us how a cholera outbreak in Angola was predicted after droughts, followed by storms, were recorded and analysed. These events are, among others, the conditions necessary for a possible cholera outbreak. The probability of a cholera epidemic is therefore great and can be predicted and hopefully prevented.
“Similar tests involving forecasts of disease, violence, and a significant numbers of deaths saw the system’s warnings correct between 70 to 90 percent of the time.” These examples prove to be most applicable when it comes to the practicing of analyzing big data for predictive purposes. Big Data analysis can be applied to anything really. It is defined by the means of mining and processing data. Not the ends which is where predictive sciences come in.
Big Data analysis differs mostly from other predictive sciences in the tools being used and the broad areas in which it can be applied. For example software analysis is being used by Microsoft and the Technion-Israel Institute of Technology to mine “22 years of archives of The New York Times and about 90 other web resources” in order to make predictions ranging from protests across the globe to various other natural, social and financial events and patterns.
Similarly, epidemiologists look at “such studies [but] are typically few in number, employ heuristic assessments, and are frequently retrospective analyses, rather than aimed at generating predictions for guiding near-term action.”
Social and political
Recorded Future is a company funded by both Google and the CIA that, by using Amazon.com’s massive servers, uses “over 150 000” resources from news archives, blogs and social media. In a recent example the company introduced a live interactive map depicting outbreaks of protests and the possibilities of near future outbreaks.
By constantly searching for keywords using “ground breaking algorithms” floating across the web and even ‘stealing’ private data or monitoring the web the company has found ways to predict certain crucial events. The ethics is one of the many obstacles associated with ‘unlocking’ Big Data using on the web. The company holds the idea dear that “all of the information on the web holds predictive power, and it is just waiting to be unlocked.”