Future Trends • 27 October 2011

Big Data: It’s both about size and technique

Read nextEndless choices for gamers this holiday season

We’re in the age of the petabyte, exabyte and zettabyte. We’re being overwhelmed with information, and we don’t know what to do about it.

There has been a data explosion in the internet age, brought about by the digital era where entry barriers have been lowered, allowing the masses to publish, to tweet, to build open source applications. The age of automation has also resulted in information generated by computers on an automated basis.

No ad to show here.

Former Google CEO Eric Shmidt says that we now, every two days, create as much information as we did from the dawn of civilization up until 2003. That’s a crapload of data or about five exabytes of data to be precise.

It would be a shame to chuck it all in the bin, so how do we make use of this endless list of ones and zeros? Enter the field of Big Data — a relatively new field dedicated to mining this data with increasingly complex algorithms to make it useful.

Steve Watt is the innovation guy at HP in Texas and he says we have more-or-less learnt to filter masses of data by using certain “filter patterns”. We can filter with search, and we can filter socially too. Infographics also help us decode information, presenting endless, boring text data visually in a way that makes it easily graspable.

But what if we need to grasp all this data at a large scale? We need third-party services to help us gather, sort, process and deliver that data in an intelligible and relevant fashion. There are dedicated “data marketplaces” out there that can help with this such as Infochimps, Factual and Nutch.

When the data gets really big it needs to be structured, and be in a low-latency environment. This is where it becomes an exercise in deep geek and you need serious services to achieve this.

Watt sees fantastic application for Big Data, allowing us to capture historical information and create learnings that help us build models that help us make decisions. It creates business intelligence.

To prove his point, he talks of a little pet project that he has been doing. Watt has used an analysis of CrunchBase, a directory of US tech companies, startups and VCs, to work out if the country is in a technology bubble or not.

He created his own code that analysed which company got funded, their address, how much they got, when they were funded and who funded them. He grabbed all this public data, popped off to Infochimps to get the zipcodes of these companies and cross-referenced.

The result? A visualisation that accurately answers the question: Are we in a tech bubble? — all from the public data that is out there.

Bridging Structured and Unstructred Data with Apache Hadoop and Vertica

View more presentations from Steve Watt

Watt reckons the US is not in a tech bubble. And he also found out that biotech is the biggest investment sector, followed by software and cleantech, among other things. He also found out that more money was invested in San Francisco than the other major US cities. No surprise there I guess.

That’s an example of intelligence gathered via algorithms, presented visually and in an intelligible fashion. This is an example using external websites, but the exercise could happen internally too if you are a large corporate with masses of data to mine.

Matthew Buckland: Publisher

Matthew Buckland is an internet entrepreneur and investor with more than 20 years’ experience working in management and strategic roles for internet and technology businesses. His specialisation is internet and digital media, marketing and content. He founded the digital marketing and strategy agency Creative Spark which he sold in 2015, five years after he founded it, to UK-listed firm M&C Saatchi PLC. After an earn-out of approximately 3 years, Matthew eventually exited his company in 2018, taking a division of the company, Burn Media, with him. He currently runs the Burn Media Group -- a grouping of technology publishing brands which report on emerging markets: Memeburn.com, Ventureburn.com, Gearburn.com and others. Matthew is EIR to the Media Development Investment Fund (MDIF), a New York and Prague-based global media fund, helping its investees with digital strategy and business models in a range of countries. The MDIF has invested more than $166-million in over 114 independent businesses in 39 countries.

Big Data: It’s both about size and technique

Matthew Buckland: Publisher

News

The Future of Forex Brokers in South Africa – Consolidation, Regulation, or Exit?

Kimi K2.5 Enters the Global AI Race Against ChatGPT 5.2 and Its Rivals

Is 2026 The Crypto Tipping Point For African Merchants?

Can Digital Skills Unlock South Africa’s Next Growth Cycle?

We use cookies

Welcome to Memeburn