Zacharias 🐝 Voulgaris

4 months ago · 3 min. reading time · visibility ~10 ·

chat Contact the author

thumb_up Relevant message Comment

The Net of Knowledge in Data Science (and Beyond)


“Knowledge and Know-how in Data Science form a net of sorts. The fewer gaps in your knowledge, the finer the net, and the better your chances of catching insights in the data.” - Some data scientist


I was never into fishing, especially not the one involving nets. I do enjoy a good fish though, especially a fresh one, cooked properly, alongside veggies and such. Of course, if this is the fish in the analogy of the quote above, namely some insights, I enjoy that even more. After all, I'm a data scientist, and as such, I care about the fruits of the data science process, not just the data wrangling and data modeling parts of it. If you are interested in this kind of "fish" too, this is an article for you, even if you aren’t as big on data science as I am (most people aren’t!).

So, why is this knowledge a net of sorts? Well, knowledge is integrated information that has the possibility of being applicable. The specialized knowledge that is more technical and hands-on is labeled as "know-how" and is an essential part of any science, especially a hands-on one such as data science. It is a net because it involves strings of information tangled in an organized way, forming a mesh of sorts. We often call this mesh of information a knowledge graph, and it's one of the best ways to organize information, especially information that is close to our understanding and reasoning. You can argue that knowledge graphs are in the domain of Logic more, rather than some mathematical construct appealing to applied Math professionals. When we put together a good mind map, for example, we are essentially creating a knowledge graph and a functional one at that. Even if you doodle some notes graphically to make sense of something, you construct a knowledge graph, even if it's a fairly rudimentary one. If you were to do this at scale, you'd be building a network of sorts, whose nodes would be pieces of knowledge, while the relationships among those pieces would be expressed through its arcs. Interestingly, a network (or graph, in the mathematical sense) is closely linked to the imagery of a net, even if it’s not something they teach in fisherman school!

Interestingly, fishing insights in the data is something we often do with networks these days. Not the high-level ones I've just talked about, but more mathematical ones involving data directly. These networks emulate the networks of neurons in our brains since their most rudimentary parts, the artificial neurons, are modeled after the neurons in our nervous system. Organizing all these artificial neurons in this network structure and coupling them with specialized mathematical functions, so that they can express the data at hand is in essence an artificial neural network (ANN). The latter is one of the most popular data models today in data science, and it's widely used in lots of areas, due to its ability to model pretty much anything, just like a good fishing net can catch all kinds of fish. ANNs are part of artificial intelligence and find applications in other areas, not just the automated learning of patterns from the data directly, aka machine learning. When you apply ANNs in machine learning, they are often called deep learning, one of the most popular aspects of data science today.

Going back to the high-level network of knowledge, this is essentially a grounded understanding of ANNs, data, methods for processing the data (both before and after the modeling part), and expressing all that in a way that the project stakeholders understand. After all, not everyone is so big on data science, nor does he/she understand data science the way a data scientist does. Still, that net of knowledge is something they can comprehend and can be a bridge of understanding between the technical and the non-technical. Isn't that what good communication is all about? And isn't communication an essential skill in data science work? Perhaps things tend to come together when viewed this way, just like the data streams traversing an ANN converge towards the nodes of the output layer at the end of the network.

Data science is vast, and it's through deepening and applying our understanding of it that we refine it and make it strong, able to handle even the most challenging data projects. The challenge isn't always in the modeling parts; often it's linked to the raw data acquisition, its engineering, or even the communication of the findings in a convincing manner. Graphically, all this is the data science pipeline. This diagram of sorts is something essential in both understanding and communicating data science work. If you wish to learn more about this subject, check out my book Data Science Mindset, Methodologies, and Misconceptions (Technics Publications) or any other book I've written on data science. Cheers!

thumb_up Relevant message Comment

More articles from Zacharias 🐝 Voulgaris

View blog
2 weeks ago · 5 min. reading time

Data Management Best Practices for Modern Backend Data Security

Source: (after some processing work)Th ...

1 month ago · 3 min. reading time

Can We Transcend Binary Thinking?

Source: · “While binary behaviour is s ...

2 months ago · 2 min. reading time

A Modern Data Pipeline

Source: Semantix Brasil · I generally don't opt fo ...