Italian Trulli Philippe Barbe

Image of various media
credit: Image by Juliane Thomaz from Pixabay

The Disappearance of Data Science and Data Scientists

by Philippe Barbe
14 Jul 2021

The term Data Science is often attributed to computer scientist Peter Naur who used it in 1974, and statistician Jeff Wu who popularized it around 1997. `

The story goes that Naur did not like the term computer science and that Wu wanted to place statistics on an equal footing to other sciences. While Data Science became a mix of computer science and statistics, the name seems to have derived mostly out of a desire for re-branding and recognition.

With the help of considerable progress in computing power, Data Science has achieved recognition, if not as an academically recognized “science”, at least as a particularly useful body of knowledge and practices.

The phenomenal success of Google whose core product, its search engine, is completely Data Science based, made clear that businesses could monetize data. Further success by large corporations at leveraging their diverse data proved that Data Science activities could be turned into a profit center. Hence, we’ve come to Data Science as a combination of computing, statistics and business.

Companies are hiring a lot of Data Scientists, and plan to hire many more over the next 10 years. Salaries are trending up and headlines proclaim “Data is the New Oil”…”Data Scientist: Sexiest Job of the 21st Century”.

Surely neither data scientists nor Data Science are disappearing! Really?

What basis could I possibly have for the title of this article?

Let us go back to the motivations of Naur and Wu.

Historically, both computer science and statistics used to live in mathematics departments. Now, they don’t.

What happened? In short, scientific attitudes and money got in the way!

Mathematicians distinguish pure and applied mathematics.

Pure mathematics is mathematics seemingly for the sake of it. In reality, most mathematics have considerable applications. Pure mathematics is more like a very long-term investment: it more than pays for itself in several decades. It might be one of the best investments a country can make, but on any given year, it is a cost center.

In contrast, applied mathematics, being closer to applications, is perceived as more useful and has more direct application in engineering. Applied mathematics is a body of mathematics motivated by application.

Beyond pure and applied mathematics, one also distinguishes applications of mathematics. This is using mathematics to solve practical problems.

An applied mathematician may prove theorems yet most people use some mathematics in their lives so they apply mathematics.

Many Computer Scientists and Statisticians at some point felt at odds in mathematics departments, because the more applied ones stopped being mathematicians, to become users of mathematics.

At the same time, they managed to get more funding, because they were closer to applications. Hence, they put themselves in the position of power with money, while abandoning the intellectual tenets of the discipline that their mathematics departments embodied. This was not sustainable, and they split apart.

Does using a computer make you a computer scientist? Clearly not! More or less everyone is using a computer even if just the one in their cell phone.

Does using statistics makes you a Statistician? Clearly not! Figuring out an “average” of few numbers is something that most people have done at least once in their life.

So we need to distinguish between applied statistics, applied computer science, and the applications of statistics and computer science.

This is where it starts to be interesting: a Data Scientist is… a scientist. And Data Science is a science!

What is a scientist? Opinions diverge!

The Oxford dictionary gives a clear definition as a person who studies one or more of the natural sciences. Merriam-Webster is more ambiguous, proposing a person learned in science and especially natural science: a scientific investigator.

When you need a Data Scientist as a business, do you need a person who study the science of data? Probably not. Do you need a person who is learned in the science of the data? More likely. Do you need a scientific investigator? Who knows?

What you often need is a person who is knowledgeable about the techniques of Data Science. So you do not need a Data Scientist, but more a data and models practitioner.

Am I just arguing semantics?

If Data Science is becoming the practice of existing models, or the applications of existing algorithms to new data, it is no longer a science and Data Scientists are not really scientists.

That happened in Computer Science. Computer Programmers or Software Engineers are not Computer Scientists, although they are trained in Computer Science. Data Engineer and Machine Learning Engineer are new job categories but just as Computer Programmers emerged from Computer Scientists. These new professions are emerging, distinguishing between those who apply Data Science and those who make it.

While the distinction may seem mundane, it helps rationalize some of the debates in the current Data Science community. Should you know any math? With progress of drag and drop platforms, should you know any coding? With more and more computing power allowing to systematically try algorithms, should you know anything at all?

That these debates exist is the manifestation of an evolution: most applications require less and less specialized knowledge.

Just as many assembly lines do not require qualified workers, many data modeling activities do not require Data Scientists. The progress of platforms shows a movement towards data activities which will not even need people versed in data. But, sometimes assembly lines require special machines run by highly qualified people (mirror polishing is still a profession, and a very high tech one). Data are no different. Some are so special and some corporations have such specific needs that they will still need Data Scientists.

The vast majority of those current bearing the title “Data Scientist” will still have enviable careers, mostly because quantitative skills will remain in demand while the supply is overall dwindling.

But, either they will lose their “scientist” cachet, or the discipline will split in two branches… one more tied to business and focused on using the science, hence not being a science… and one that will focus on its scientific aspects that is more conceptual and more detached from the immediate applications.