Zacharias 🐝 Voulgaris

2 years ago · 2 min. reading time · ~10 ·

Blogging
>
Zacharias 🐝 blog
>
Statistics’ Shortcomings

Statistics’ Shortcomings

XAc4u.jpeg
Source: pixabay.com

Introduction

First of all, let me make it clear that I have no issue with Statistics as a sub-field of Mathematics and when it's being used in a data analytics context. The issue with this domain is that it's been used for far more than it should, while its use (or rather, abuse) has caused a lot of people to get the wrong idea about real-world matters. In other words, it has compromised people's sense of perspective, while also deluding many researchers regarding what they can do with this tool-set. In this article, we'll look at some of the most relevant shortcomings of Statistics, such as the excessive use and misuse of randomness to model variables, the excessive assumptions about the data, the subpar sampling methods Stats provides, and the lack of coherency with more data-driven approaches such as Machine Learning. Let’s look at each one of these in more detail, shall we?

Randomness as the defacto Modeling approach of the variables involved

Randomness is great. Being the cornerstone of any cryptographic system out there, it's a precious resource. Also, it can help us understand inherently unpredictable processes. No wonder it's used so much in Statistics, which seems to have woven a whole framework around it. However, just because randomness is useful, it doesn't make it ubiquitous. Treating every variable out there as a random variable may make sense in Math, but not in the real world. Of course, that's a viable way to model uncertainty but it's certainly not the only way and probably not the best way either.

Plethora of Assumptions

Since randomness doesn't have any inherent structure, to make it work in an analytics scenario, lots of restrictions need to be put in place. These take the form of assumptions, something that Statistics has in abundance. It's rare to find anything Stats-related that doesn't come with its list of assumptions about the data involved. It's fine if these assumptions hold, but what happens when they don't?

Superficial approach to Sampling

But randomness is useful when it comes to sampling though, right? That’s true, but it doesn’t always yield the best sample. After all, randomness is unpredictable and unless you manage it properly, you may end up with a biased sample. Of course, you can obtain a representative sample that’s unbiased by design, but Statistics doesn’t yield any out-of-the-box method for doing that. It makes you wonder whether the people behind this part of mathematics truly understand randomness at all. After all, they never provided a way to obtain a truly random set of numbers in an efficient and scalable way. Pseudo-random numbers don’t count.

Relationship with Data-driven approaches

As for the coherency (or lack thereof) with data-driven approaches, most Stats methods fail in this miserably. The only known exception is Bayesian Stats but that's more like the black sheep of the Stats family since most people tend to go for the Frequentist Stats. That's probably because Bayes wasn't as effective in branding his kind of Stats, even though he was probably more perceptive and more creative than all of his contemporaries in this field. Many people these days have tried bridging the gap between Machine Learning methods and Stats, but with little success due to the fundamental differences between these two paradigms.

Final words

Of course, there are several advantages to Statistics as it can be a useful tool, especially when it comes to understanding data (descriptive statistics). I'd rather not dwell on this too much, however, since every Statistics book out there makes this argument better than I ever could. Instead, I'll try to do something more original and perhaps more useful. Namely, I'll try to offer a different approach to Statistics, one that can help it integrate with data-driven frameworks effectively and efficiently. This topic, however, is one of another article. Cheers!

If you enjoy Stats, Machine Learning, and other goodies found under the Data Science and Artificial Intelligence umbrella, feel free to check out and bookmark my blog, foxydatascience.com. There I talk about topics like this one, as well as whatever else I encounter in the relentless journey through the dataverse (data universe). Cheers

Comments

Articles from Zacharias 🐝 Voulgaris

View blog
1 year ago · 5 min. reading time

Introducción no tan técnica · Cualquiera que se haya adentrado en el mundo de la informática ha oído ...

1 year ago · 2 min. reading time

This is a stock image for the term “mentee” and doesn't correspond to the taxonomy described in this ...

1 year ago · 3 min. reading time

I've never had any serious issues with my digestive system, but it doesn't hurt to be prepared. Afte ...

Related professionals

You may be interested in these jobs


  • Leidos Springfield, United States

    R Description The GIS / Cartographic Analyst - Journeyman will work within a government/contractor team environment providing Digital Nautical Chart and Electronic Nautical Chart Conversion, Production and Review as well as provide support to existing and developing mission areas ...

  • St. Mary's General Hospital

    Clinical Educator

    8 hours ago


    St. Mary's General Hospital Passaic, United States

    Overview: · St. Mary's General Hospital, located in Passaic, NJ, is a community-based tertiary medical center focused on providing quality, compassionate care. It is an acute care hospital providing a broad range of services including cardiovascular services as well as a compreh ...


  • Northside Hospital Atlanta, United States

    Northside Hospital is award-winning, state-of-the-art, and continually growing. Constantly expanding the quality and reach of our care to our patients and communities creates even more opportunity for the best healthcare professionals in Atlanta and beyond. Discover all the possi ...