What is all the fuss about Big Data and Data Analytics? It seems that anywhere we turn, we find an article, a presentation, a conference or a demonstration related to one of these two topics.
Is it because we can no longer deal with the incoming flux of data from all of its sources? Think about your own e-life; you probably interact daily with most, if not all, of the following media: e-mail, web pages/sites, social media and other web/smartphone applications. It is becoming extremely difficult to splinter our mind into so many slots and perform as many context switches per day. It has also become very tough to keep up with the incoming data streams, so much so, that we need applications to help us deal with all of this data (e.g. aggregation sites, smartphone applications).
What’s the big deal with Big Data?
Today there are numerous diverse and illicit data sources producing lots of data that is being sent on high-speed interconnecting networks. These sources typically fall in two categories: structured vs. unstructured (sometimes referred to as hard vs. soft) and sensed vs. unsensed data. Structured (or “hard”) data is calibrated and precise such as data from imagery and radar sensors, while unstructured or (“soft”) data is uncalibrated and imprecise such as operator reports and open source intelligence available from internet web pages.
Let us explore the changes data has witnessed in recent years: first, it is coming at us at breakneck speed (i.e. velocity) which is currently typically expressed in the billions of bits per second. Second, it is ever expansive (i.e. volume) with terabyte hard drives the contemporary norm. Third, it is being packaged and sent in numerous ways (i.e. variety) with hundreds, if not thousands of data types representing its multi-varied diversity. Finally, it is coming from any and all sources, even hostile or purposefully misleading ones, where its trustworthiness (i.e. veracity) comes into question.
Furthermore, there are no inhibitions in today’s youth (i.e. the current generation of electronic media consumers and producers) to disseminate personalized information about their daily habits, preferences, joys and opinions. This data is being posted, sometimes even without a second thought, and for all to see, allowing each data producer to stand on their own pedestal and address the e-crowds as they see fit. This freedom comes at a price, as we do not always know who is listening in that crowd.
Meet Big Brother
What’s next? Well, if we continue to produce more and more data that we cannot handle, we will increasingly require applications and services to detect the underlying patterns, to filter out the noise and to provide us with the information and knowledge that will help us make our important decisions in a timely and effective manner.
Moreover, one can foresee a 5-stage process developing over time: we will initially depend upon those very applications for our routine tasks (Stage 1 – e.g. navigating a new city, finding the best price online, aggregating news stories); then, we will depend on them for our daily tasks (Stage 2 – e.g. planning our day, pointing out interesting news stories, paying our bills). Subsequently, we will depend on them for our essential tasks (Stage 3 – e.g. handling our finances, planning our trips, expressing our views). Eventually, we will depend on them for our critical tasks (Stage 4 – e.g. selecting our careers, controlling our personalities, defining our lifestyles). Finally, we will depend on them for our very survival (Stage 5 – e.g. foraging for food, alerting us about imminent dangers, making the fight or flight decision for us).
As these applications are continually embedded into our daily routine, they will disappear from it as they become so ubiquitous that life cannot be remembered before their existence. Can you think back to the days before social networking and online mapping? Whatever did we do before blogs, tweets and even emails? Looking at the above staging process, Stage 2+ applications have the necessary features to also become ubiquitous and ultimately disappear into our very lives. However, do these applications truly disappear or do we just trust them to provide us with accurate (and benevolent) results without hesitation? If the latter case is true, then there is the possibility, whether intentional or accidental, that their results form the basis for controlling our finances, schedules and options as well as shaping our opinions, viewpoints and personalities. These applications could end up training us into believing everything they advise us on based on their sheer computational power, refined optimization algorithms, powerful inference capabilities and previously-clean track records; they could end up enacting the Orwellian prophecy in a technocratic totalitarian society.
What started out as a simple aid to help us sift through the Big Data ecosystem ended up becoming a suite of applications that help Big Brother increase surveillance and exert authoritative control upon society. Will this scenario ever come to fruition? Probably not, however, things could get very interesting when Big Data does finally meet Big Brother.