Great article. I would characterize the key problem with "data science" as it is currently popularized is that it is not scientific enough. Seeing if one model will beat another one just by looking at the absolute MSE or RMSE is not science. It fails to take the inherent randomness of the data (of any data sample) into account.

Moreover, torturing the data to yield the best MSE or RMSE while totally failing to consider if the data is representative of the population or phenomenon of interest is also not science.

In my opinion, true "science" in data science must start with statistical science, which as you rightly pointed out is hundreds of years old. Data scientists need to learn and understand basic statistics well, and be able to think probabilistically.

