Zacharias 🐝 Voulgaris

1 year ago · 2 min. reading time · ~10 ·

Zacharias 🐝 blog
Statistics’ Shortcomings

Statistics’ Shortcomings



First of all, let me make it clear that I have no issue with Statistics as a sub-field of Mathematics and when it's being used in a data analytics context. The issue with this domain is that it's been used for far more than it should, while its use (or rather, abuse) has caused a lot of people to get the wrong idea about real-world matters. In other words, it has compromised people's sense of perspective, while also deluding many researchers regarding what they can do with this tool-set. In this article, we'll look at some of the most relevant shortcomings of Statistics, such as the excessive use and misuse of randomness to model variables, the excessive assumptions about the data, the subpar sampling methods Stats provides, and the lack of coherency with more data-driven approaches such as Machine Learning. Let’s look at each one of these in more detail, shall we?

Randomness as the defacto Modeling approach of the variables involved

Randomness is great. Being the cornerstone of any cryptographic system out there, it's a precious resource. Also, it can help us understand inherently unpredictable processes. No wonder it's used so much in Statistics, which seems to have woven a whole framework around it. However, just because randomness is useful, it doesn't make it ubiquitous. Treating every variable out there as a random variable may make sense in Math, but not in the real world. Of course, that's a viable way to model uncertainty but it's certainly not the only way and probably not the best way either.

Plethora of Assumptions

Since randomness doesn't have any inherent structure, to make it work in an analytics scenario, lots of restrictions need to be put in place. These take the form of assumptions, something that Statistics has in abundance. It's rare to find anything Stats-related that doesn't come with its list of assumptions about the data involved. It's fine if these assumptions hold, but what happens when they don't?

Superficial approach to Sampling

But randomness is useful when it comes to sampling though, right? That’s true, but it doesn’t always yield the best sample. After all, randomness is unpredictable and unless you manage it properly, you may end up with a biased sample. Of course, you can obtain a representative sample that’s unbiased by design, but Statistics doesn’t yield any out-of-the-box method for doing that. It makes you wonder whether the people behind this part of mathematics truly understand randomness at all. After all, they never provided a way to obtain a truly random set of numbers in an efficient and scalable way. Pseudo-random numbers don’t count.

Relationship with Data-driven approaches

As for the coherency (or lack thereof) with data-driven approaches, most Stats methods fail in this miserably. The only known exception is Bayesian Stats but that's more like the black sheep of the Stats family since most people tend to go for the Frequentist Stats. That's probably because Bayes wasn't as effective in branding his kind of Stats, even though he was probably more perceptive and more creative than all of his contemporaries in this field. Many people these days have tried bridging the gap between Machine Learning methods and Stats, but with little success due to the fundamental differences between these two paradigms.

Final words

Of course, there are several advantages to Statistics as it can be a useful tool, especially when it comes to understanding data (descriptive statistics). I'd rather not dwell on this too much, however, since every Statistics book out there makes this argument better than I ever could. Instead, I'll try to do something more original and perhaps more useful. Namely, I'll try to offer a different approach to Statistics, one that can help it integrate with data-driven frameworks effectively and efficiently. This topic, however, is one of another article. Cheers!

If you enjoy Stats, Machine Learning, and other goodies found under the Data Science and Artificial Intelligence umbrella, feel free to check out and bookmark my blog, There I talk about topics like this one, as well as whatever else I encounter in the relentless journey through the dataverse (data universe). Cheers


Articles from Zacharias 🐝 Voulgaris

View blog
7 months ago · 2 min. reading time

The “Language Model for Dialogue Applications" AI Google developed last year is a machine learning-p ...

8 months ago · 4 min. reading time

I have never been such a big fan of an operating system to try to get others to use it. I like how G ...

8 months ago · 4 min. reading time

What Is Data Strategy Anyway? · Data strategy is a way to utilize data to produce insights useful fo ...

Related professionals

You may be interested in these jobs

  • Home Instead


    Found in: Jooble US - 2 days ago

    Home Instead Chestnut, IL

    Home Instead is looking for caring and compassionate Caregivers we call ours Care Professionals to become a part of our team and join our mission of expanding the worlds capacity to care. Home Instead provides a variety of non-medical services that allow aging adults to remain in ...

  • Sedano's Corporate


    Found in: Jooble US - 2 days ago

    Sedano's Corporate Hialeah, FL Full time

    Store/Tienda #22 - Hialeah (5360 West 16th Ave., Hialeah FL 33012) - Shift/Turno: Full-Time/Tiempo Completo (Flexible; Morning or afternoon/Flexible; Por la mañana o por la tarde) - The duties and responsibilities of the position are to be carried out in a manner that is consis ...

  • Amare Medical Network

    Travel Nurse RN

    Found in: Jooble US - 2 days ago

    Amare Medical Network Saint Louis, MO

    Amare Medical Network is seeking a travel nurse RN Long Term Acute Care for a travel nursing job in Saint Louis, Missouri. · Job Description & Requirements · Specialty: Long Term Acute Care · Discipline: RN · Start Date: ASAP · Duration: 13 weeks · 36 hours per week · S ...