Thoughts on the past and the future of Data Analytics — Part 1

Le Nguyen The Dat
3 min readJun 19, 2018

Being in this field for a while now, I think this is quite an interesting time to write a summary from my personal experience on the changes in the last few years, the new directions, or simply something we should do to keep up with the trend in the future.

There will be two parts of this blog, the first one (this one), being about Big Data and Business Intelligence, while the second one will be about Machine Learning and Data Science in general.

1. Big Data, frameworks and tools

https://blogs.oracle.com/oracle-data-warehouse-solutions-business-value-from-data-center-to-cloud

A couple of years back, most companies would stick with Enterprise Data Warehouse solutions from Oracle or IBM and the like (some of them still does — having to spend millions of dollars up-front and millions of dollar on a recurring basis). Big data simply wasn’t something that was suitable to most businesses at the time.

https://aws.amazon.com/redshift/

When Amazon Web Services (AWS) released Amazon Redshift and Google released Big Query with no up-front cost, and very low hourly pricing scheme, Big Data suddenly became more affordable for most companies.

At the same time, lots of open-source tools and frameworks were also released (now that people get to play with big data a lot more), some noticeable ones were Apache Spark, Presto, Airbnb’s Airflow, and so on…

https://medium.com/the-astronomer-journey/airflow-and-the-future-of-data-engineering-a-q-a-266f68d956a9

What has not changed much is SQL — a programming language appeared over 40 years ago and has always been by far the most commonly used when it comes to Data thanks to its shallow learning curve.

SQL will still be widely used in the future, with the main difference and focus being the infrastructure it runs on — whether it’s Hive, Presto, columnar database architecture, or other new big data framework.

Data Engineers are highly sought after today, and will continue to be in the future, and in a lot of companies, even in higher demand than Data Scientists.

Unlike the traditional Data Warehouse / Database Administrative job that people usually mistaken them for, the Data Engineers nowadays have to possess a very wide range of skillset and knowledge, from Data Warehousing, Database Architect, Database Optimization, Data Modeling, Data Integration, DevOps, and sometimes, Data Analysis and Machine Learning too. These skills can not be learn simply from school or online courses, but from real-world experiences, thus, the demand for the new Full-stack Data Engineers will soar in the next couple of years.

2. Business Intelligence

Business Intelligence has long been the fore-front of Data Analytics that required very unique, specific, and hard to acquire skillset in the past. It is, however, a lot more mainstream nowadays, thanks to the rise of Business Intelligence tools and infrastructures over the years.

In the near future, traditional reporting Business Intelligence will become more and more obsolete with repetitive tasks that soon to be fully automated, and it is important for one to focus on either of these to be relevant in the future:

  • Data Analyst path: acquiring the core skillset of extracting insights, deep analysis and create recommendations, business decisions out of data, or
  • Data Engineer path: the engineering aspect of it, to develop, setup and maintain the automated analytics systems, or
  • Data Science path: somewhat a combination of the above (with a couple of more things)

As for businesses, investing in proper Business Intelligence infrastructure and personnel is always a must, and especially, much more important to establish early on, before moving towards Data Science and Machine Learning products. My personal recommendation is to focus on hiring and growing people who can transition toward the three career paths above, instead of the traditional “reporting” Business Intelligence role.

--

--