Zacharias 🐝 Voulgaris

4 months ago · 2 min. reading time · visibility ~10 ·

chat Contact the author

thumb_up Relevant message Comment

Statistics’ Shortcomings



First of all, let me make it clear that I have no issue with Statistics as a sub-field of Mathematics and when it's being used in a data analytics context. The issue with this domain is that it's been used for far more than it should, while its use (or rather, abuse) has caused a lot of people to get the wrong idea about real-world matters. In other words, it has compromised people's sense of perspective, while also deluding many researchers regarding what they can do with this tool-set. In this article, we'll look at some of the most relevant shortcomings of Statistics, such as the excessive use and misuse of randomness to model variables, the excessive assumptions about the data, the subpar sampling methods Stats provides, and the lack of coherency with more data-driven approaches such as Machine Learning. Let’s look at each one of these in more detail, shall we?

Randomness as the defacto Modeling approach of the variables involved

Randomness is great. Being the cornerstone of any cryptographic system out there, it's a precious resource. Also, it can help us understand inherently unpredictable processes. No wonder it's used so much in Statistics, which seems to have woven a whole framework around it. However, just because randomness is useful, it doesn't make it ubiquitous. Treating every variable out there as a random variable may make sense in Math, but not in the real world. Of course, that's a viable way to model uncertainty but it's certainly not the only way and probably not the best way either.

Plethora of Assumptions

Since randomness doesn't have any inherent structure, to make it work in an analytics scenario, lots of restrictions need to be put in place. These take the form of assumptions, something that Statistics has in abundance. It's rare to find anything Stats-related that doesn't come with its list of assumptions about the data involved. It's fine if these assumptions hold, but what happens when they don't?

Superficial approach to Sampling

But randomness is useful when it comes to sampling though, right? That’s true, but it doesn’t always yield the best sample. After all, randomness is unpredictable and unless you manage it properly, you may end up with a biased sample. Of course, you can obtain a representative sample that’s unbiased by design, but Statistics doesn’t yield any out-of-the-box method for doing that. It makes you wonder whether the people behind this part of mathematics truly understand randomness at all. After all, they never provided a way to obtain a truly random set of numbers in an efficient and scalable way. Pseudo-random numbers don’t count.

Relationship with Data-driven approaches

As for the coherency (or lack thereof) with data-driven approaches, most Stats methods fail in this miserably. The only known exception is Bayesian Stats but that's more like the black sheep of the Stats family since most people tend to go for the Frequentist Stats. That's probably because Bayes wasn't as effective in branding his kind of Stats, even though he was probably more perceptive and more creative than all of his contemporaries in this field. Many people these days have tried bridging the gap between Machine Learning methods and Stats, but with little success due to the fundamental differences between these two paradigms.

Final words

Of course, there are several advantages to Statistics as it can be a useful tool, especially when it comes to understanding data (descriptive statistics). I'd rather not dwell on this too much, however, since every Statistics book out there makes this argument better than I ever could. Instead, I'll try to do something more original and perhaps more useful. Namely, I'll try to offer a different approach to Statistics, one that can help it integrate with data-driven frameworks effectively and efficiently. This topic, however, is one of another article. Cheers!

If you enjoy Stats, Machine Learning, and other goodies found under the Data Science and Artificial Intelligence umbrella, feel free to check out and bookmark my blog, There I talk about topics like this one, as well as whatever else I encounter in the relentless journey through the dataverse (data universe). Cheers

thumb_up Relevant message Comment

More articles from Zacharias 🐝 Voulgaris

View blog
2 weeks ago · 5 min. reading time

Data Management Best Practices for Modern Backend Data Security

Source: (after some processing work)Th ...

2 months ago · 2 min. reading time

Censoring Platforms

Source: · This article was inspired by ...

3 months ago · 2 min. reading time

Facing the Heat (a Raspberry Pi article)

Source: · Lately, I've been working wi ...