It has been said that “the only way to eat an elephant is one bite at a time”, and this rings true for Big Data. Just as applications like Google have become synonymous with the concept of searching for information on the internet through the use of a search engine, it’s not surprising that application frameworks such as Hadoop are becoming synonymous with the concept of Big Data (it may be no coincidence that the logo for Hadoop is a yellow elephant). Breaking down Big Data into thinly sliced concepts should help us understand its nature.
There is also a unique mental process that cannot be ignored with regards to how humans consume data. Malcolm Gladwell in his 2005 book Blink: The Power of Thinking Without Thinking defines the “theory of thin slices” as “How a little bit of knowledge goes a long way”, in allowing the individual to decide what is truly important. Gladwell poses the question “How is it possible to gather the necessary information for a sophisticated judgement in a short space of time?” The principle of Business Intelligence is to consolidate disparate data, big or small, distil it to a simple truth, so that a business user can consume it and make a decision whether it is operational or strategic in nature.
There are three main concepts you need to understand about Big Data:
1. Big Data is a new concept
Back in 2008, articles were being published regarding the challenge of growing data volumes and the ability we have to manage, consume and visualise it. According to the Mike 2.0 definition, “Big data can be very small and not all large datasets are big[…] Big then refers to big complexity rather than big volume” and this complexity has been challenging humanity from the day it started consuming and interpreting data.
2. Big Data decommissions the concept of a data warehouse
Just as the vendors claimed that in-memory analytics applications would negate the need for data warehouses, so to these same sales executives are suggesting the same regarding Big Data. Dr Ralph Kimball, an author on the subjects of data warehousing and Business Intelligence, put Big Data in context in his 2011 white paper The Evolving Role of the EDW in the Era of Big Data Analytics in saying, “big data is a paradigm shift in how we think about data assets”, and that now “Data is an asset on the balance sheet”.
He points out that “With the benefit of hindsight gained from the traditional data warehouse experience, the big data analytics version of data warehousing is likely to consolidate quite quickly. Only the bravest organisations with very strong software development skills should consider rolling their own big data analytics applications directly on raw MapReduce/Hadoop.”
3. Enter the Data Scientist
Interestingly the terms “Data Science” and “Data Scientist” have been emerging with the hype around Big Data. These terms imply that there are now new ways to understand and gain insight into data and that it is best left to the experts.
It’s worth noting the underlying principles overlap with the collection of data and its visualisation within the realm of the well-established Business Intelligence competency. However, these terms have been around since 2001, for example, Dr. William S. Cleveland’s article “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics”.
In his article, “The future belongs to the companies and people who turn data into products”, Mike Loukides explains that “merely using data isn’t really what we mean by data science. Data science enables the creation of data products.
Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution”. In the past companies sought out the help of Business Intelligence consultants who would endeavour to gather from the business their requirements and the mappings of metrics to source data elements.
To a large extent, this will continue to be the case but now the area where a data scientist now can operate is where a company introduces a Big Data competency, starts collecting data but does not know the value it contains or at least has no requirement of how to define it, so they entrust the extraction of value over to the Data Scientist.
Delving into Big Data and its related concepts, we can extract value and insight for business and industry alike, prompting us to start asking the questions we never knew needed answers to.