In Praise of the Data Engineer
This article is partly inspired by Ken Boddie's article about the various types of engineers and partly from one of the great quotes in Fay's latest article.
The data engineer is the unsung hero of data science. He (that role is usually maned by males) is the one who does all the low-level work, including extraction, transferring, and loading (ETL) the data, making it ready for the more refined processes of the data science pipeline. He is the plumber of that pipeline, making sure that the data gets from A to B and dealing with all the crap that it innately has. Well, nothing smelly or disgusting but nothing the data scientist dealing with modeling would want to deal with either!
Also, the data engineer often liaises with one of the most relevant professionals in the data world: the data architect (or the database administrator in some cases). This role is a more established and a more prestigious one in many cases since everyone (even the less data-savvy individuals) understands the need for an architect. It could also be better branding since architects have a good rep in general when it comes to buildings, even if it's the civil engineers who make these designs come to life, while they also shoulder most of the responsibility for these structures. Still, data architects are essential and most data scientists have no clue how important their role is. Data engineers, however, deal with them directly as they often need to access the databases the data architects have designed and often maintain.
The data engineer is also someone who has to come up with hacks to make the data come to life. Even if the data is well-governed and doesn't have too many issues, someone has to prepare the data so that it's ready for the data models data scientists use. The latter requires a more or less structured dataset, often in the form of a matrix or a data frame. Even though matrices are ubiquitous in Math, they aren't what you expect them to be in a data science setting. It's not enough to group a bunch of arrays into a rectangle of values and give it to the model. The arrays (vectors) used need to be information-rich, and they often also need to be normalized. Also, they often need to be aligned with the variable we have to predict (which can be more than one, making the problem more complex) whenever applicable. Usually, it's the data engineer who undertakes all these tasks.
For me, the main advantage of the data engineer role is that it's mostly data-driven. The data engineer is a pragmatic fellow and usually relies on facts and evidence, rather than some fancy mathematical model some theorist came up with because he was too bored or incapable to do anything more useful. There is a time and place for mathematical models, but as we now live in a data-driven world, they are more like cute artifacts in this field, soon to be forgotten as the newer generations of data scientists learn the ways of A.I. and true data science. Statistics may linger since there are always hard-core individuals who are too rigid about this sort of thing. Still, the reality of the matter is that nuts and bolts are always going to be there, even when the fancy mathematical models of Stats are buried alongside their most fervent champions.
As for what the future holds for data engineers, well, let's just say that the best is yet to come!
Source: pixabay.com · “While binary behaviour is s ...
Source: pixabay.comSheltered by it, day after day, ...
You have no groups that fit your search