A Rookie Perspective: First Impressions & Top 3 Takeaways from Big Data TO

Big Data


Big Data

July 6, 2017

Two  weeks ago I had the pleasure of attending the Big DataTO conference with my colleagues David Vuong and Edmond Chan.

As someone new to EHSQ software (it’s still my first month!) – especially in BI/Analytics and Big Data, I used this conference to take a rapid plunge into unfamiliar waters.

As a team, we decided to divide and conquer. Our lunches and breaks were spent sharing our findings, and we thought we’d share our learning with you in a series of blog posts.

The Opportunity for Big Data is REALLY Big and it’s Happening NOW

According to SAS, the three key ingredients essential for tipping Big Data into the mainstream have come together. These include:

  1. The abundance of available data that organizations can store – this includes text, voice, and image data
  2. Computing power is available so the data can be analyzed in a cost-effective manner
  3. There is maturity in the algorithms such that machine learning can be understood and humans are better at automating  

When it comes to the abundance of data alone, it’s totally staggering. I did a little search to see how much data is being created per day. DN Capital suggested that we are creating 10 million Blu-ray discs worth of data every day – that’s enough to stack as high as four Eiffel towers! I further drilled down to see how much is being analyzed, and according to Harvard in 2013, it was only 0.5% of what it is today! As more and more data is being added to the internet via the Internet of Things and new applications, I can’t imagine this ratio has changed substantially.

According to App Developer Magazine, “More data was created in the last two years than the previous 5,000 years of humanity. In 2017, we will create even more data in one year alone.”

Top Three Take-Aways

After absorbing the magnitude of the amount of data being produced, here are my top three take-aways from the conference. 

Open Source is Integral to an Organization’s Data Strategy: When it comes to Big Data, companies are looking at an array of Open Source platforms. Gone are the days of picking one expensive proprietary system and sticking with it. This is being driven by multiple factors including the expense of tying oneself to a single platform as well as the ability to attract current and future talent. Today’s students are entering the workforce with education built on open source technologies.  As mentioned by almost every speaker on the technical track but especially well phrased by JL Verboomen from, “Open Source platform languages such as Python, R, and Scala dominate the market and much of the work that data scientists do will revolve around open platforms. They create agility for data scientists and reduce friction of adoption.” So, if you’re new to the industry like me, know you’ll be hearing a lot of the following in conversation; Apache Hardoop, Scala, Spark, Watson, Elasticsearch, Kafka, and more.

Data is Dirty: When Facebook’s Nav Kesher presented, he stated that 90% of his time is used cleaning up data. I was quite surprised by this at the time, but in retrospect maybe not too surprised. As a marketer, I often spend hours upon hours in Excel or other programs cleaning up my campaign contact lists to make sure things like postal codes and phone numbers are consistently displayed. In fact, Nav went on to state that only 20% of Facebook data is processed, a number that was confirmed in an IBM presentation where the speaker also stated that unprocessed data is growing at a 60% compound annual growth rate (CAGR).  So how is technology keeping up?  Data Analysts are creating data lakes in which they can store and protect unstructured data stored before it is structured and moved into a data warehouse.

The Opportunity for Applications is Endless:  An entire track at the conference was dedicated to applications built on Big Data. Speakers discussed applications ranging from figure1, which connects physicians from around the world, allowing them to collaborate in order to find cures to rare conditions to Hopper, which helps customers make smart air travel purchases.

It’s these companies that are becoming insight driven who are reaping the most reward.  According to Deloitte, companies that can ask the right question, do the right analysis, and take the right actions are seeing huge returns. In fact, 85% of data mature organizations exceeded their corporate goals.

That leads us to examine a critical question posed by Paul Zikopoulos from IBM, “What is the Price of Not Knowing?”  For us in EHSQ, this is a potent question.  At Cority, we believe there are opportunities to remove risk from operations using this data. Our experts are examining different types of data and how to create results that will be meaningful to EHSQ performance.