Italian Trulli Philippe Barbe

Image of various media
credit: Tumisu from Pixabay

What happened to the Data Scientists?


by Philippe Barbe
20 Oct 2021

Most Data Scientists I interview do not want to talk about data, or science, but only about business.

How did we get here? And, why is this problematic?

A Scientist is a practitioner of science, and in particular someone of knowledge about the techniques used in his or her field of interest. As the name indicates, a Data Scientist is a Scientist whose main focus is data. This is not a novel profession. “Big data” is also not new. Numerical data have been around as long as humans have been counting. Tax collectors, Accountants, indeed all professions dealing with numerical data are basically specialized variations of Data Scientists.

It can be argued that modern Data Scientists are different because they deal with data of unprecedented scale, and therefore are practicing a new type of science. But what is a massive amount of data? It’s all relative to tools available to process that data. In the Middle Ages, data that spanned an entire book was considered a massive amount. In the 1980s Census data was considered massive but today is considered to be on the small end of really big data. Big data is, in essence, whatever amount of data is hard to process and manage with the tools you have available.

When was the beginning of what we call Big Data that we associate with the modern Data Scientist?.

WWI saw major advances in Operations Research and the use of computers. Manufacturing in the 1950s capitalized on the scientific progress made during the war. In the US, McNamara embodies the data driven executive, transforming Ford from a money-losing company to a profitable one.

Quantitative trading appeared in the early 1990s. Jim Simons, a professional Mathematician who could generate generous returns (40% yearly over 30 years) with quantitative trading, made clear that mathematical models, algorithms and their computer implementations applied to stock market data was a new way of making money. Financial institutions quickly embraced the idea of quantitative trading.

This era may have been the golden age of Data Scientists: while the market for academic tenure track positions was weak, Wall Street welcomed Math and Physics PhDs.

As companies started to accumulate more data and data-centric companies like Google built fortunes on algorithms, many businesses realized that their data had value. At the minimum, customer data offered the potential to understand customers better, and therefore, make better business decisions.

Unfortunately, PhDs come in all sorts of flavors.

Because of the weak academic market, Wall Street attracted very good talents to an industry where quantitative business decisions are quite simple: buy or sell.

In legacy industries, decisions are more complex since they involve complex business rules made for humans. Automation involves changing business processes, workflows, reorganizing production lines, etc.

Business leaders, wanting to capitalize on data, hired Scientists who did not necessarily understand the human part of the business such as why seemingly irrational decisions were made, the influence of human factors such as fear and ego, and how they translate into what is called business complexity.

Most people find science, particularly the theoretical part of data science, difficult. Business leaders had no appetite to gain much technical understanding of the mathematics underpinning data science, and the packaging and success of a few algorithms shifted the emphasis to application of algorithms, more than their design.

Because most business had not yet exploited their data, the bar for success was quite low. Even simple methods guaranteed some degree of success, as long as the IT organization could manage the data asset.

A few things happened all at once.

A small set of algorithms became quite successful at addressing a wide range for problems (around tree-based methods or neural network for instance), reflecting the low level of exigence, and these algorithms were packaged in a way that required little theoretical knowledge. This led to a reduction of the tool box used by Data Scientists.

At the same time, businesses realized that scientists’ interests were not necessarily aligned with those of the business because scientists tend to focus on the “Why?”, searching for explanations as to why things are what they are, while business tend to focus on the “What?”, searching for what can fulfill customers’ needs.

That divergence of focus soon led to businesses starting to ask for business acumen when hiring Data Scientists. Now Data Scientists needed to be not just Scientists, but they needed to be business centered too. That trend accelerated with more demand for scientists and shortage of skills in the market. That managers were not particularly well trained at evaluating the science did not help. Similarly, the natural tendency to focus on operation because it is the short term exacerbated the need for business knowledge.

This led us to where we are: Scientists unable to talk of science! Given how I interview, this shift is particularly striking.

I always ask one question: “Pick a technical topic you know really well, and tell me about it.” To my surprise, I seldom get answers that are technical enough. At best candidates talk, tossing out some jargon to show me that they know the terms. But they get uncomfortable when I ask them to go deeper, write formulas, spell out algorithms, draw architectural diagrams. How can we have a meaningful scientific discussion if there is no written support?

Yes, at a high level science is about concepts but as a Data Scientist I know the concepts and I expect a technical interview to be, well, technical… even highly technical.

The online interview doesn’t help either. In a face-to-face office setting I would have a board, and I would invite candidates to use it. In remote interviews, I am surprised that the vast majority of these so-called Scientists have not mastered the presentation tools to give a technical explanation. The exceptions… one pointed the camera at his whiteboard while he drew out his explanation, another connected a tablet and used a pen to deliver a gorgeous lecture that showed a complete mastering of the subject… should be the rule.

That these Data Scientists do not have best-of-breed technology to communicate with me, with scientific notations, suggests that this is not something that they need in their daily interactions. Not a good sign.

And what does it say of their current managers? Are they just accepting standard out-of-the-box solutions? Don’t they dig into what their subordinates are doing?

And, what does it say about the team they are on? Do they all have the same knowledge so no explanation is needed? Do they ever talk about and debate really technical matters?

What does it say about the way they think of models? Essentially, they seem to confuse algorithms and models, and therefore, do not need to spell out models.

Mathematical modeling requires one to wonder about the “Why?”. It is about understanding what drives business phenomenon, about describing and voicing an opinion. It is accepting accountability for the choice made, and taking a stand on the structure of the business reality. Because mathematical modeling does not come out of the box, it requires patient analyses. Because a model represents beliefs on how business quantities are related, finding a good model requires the ability to question one’s own belief. This is very antithetical to the naive belief that a few algorithms can solve any business problem.

This inability to get technical seems to reflect a certain naivety, if not laziness, in the thinking when choosing an algorithm among a few is done with a succession of trial and error experiments where one uses an algorithm, sees the performance, tries another one, compares. A machine can do that just as well.

Data science stops being a science and becomes a mindless activity when it is limited to be the application of algorithms and Data Scientists stop being Scientists when they limit themselves to a small set of well-known techniques that they may not even understand particularly well. If you are a business person, the question then is how much should you pay for that?