In Praise of the Data Engineer
This article is partly inspired by Ken Boddie's article about the various types of engineers and partly from one of the great quotes in Fay's latest article.
The data engineer is the unsung hero of data science. He (that role is usually maned by males) is the one who does all the low-level work, including extraction, transferring, and loading (ETL) the data, making it ready for the more refined processes of the data science pipeline. He is the plumber of that pipeline, making sure that the data gets from A to B and dealing with all the crap that it innately has. Well, nothing smelly or disgusting but nothing the data scientist dealing with modeling would want to deal with either!
Also, the data engineer often liaises with one of the most relevant professionals in the data world: the data architect (or the database administrator in some cases). This role is a more established and a more prestigious one in many cases since everyone (even the less data-savvy individuals) understands the need for an architect. It could also be better branding since architects have a good rep in general when it comes to buildings, even if it's the civil engineers who make these designs come to life, while they also shoulder most of the responsibility for these structures. Still, data architects are essential and most data scientists have no clue how important their role is. Data engineers, however, deal with them directly as they often need to access the databases the data architects have designed and often maintain.
The data engineer is also someone who has to come up with hacks to make the data come to life. Even if the data is well-governed and doesn't have too many issues, someone has to prepare the data so that it's ready for the data models data scientists use. The latter requires a more or less structured dataset, often in the form of a matrix or a data frame. Even though matrices are ubiquitous in Math, they aren't what you expect them to be in a data science setting. It's not enough to group a bunch of arrays into a rectangle of values and give it to the model. The arrays (vectors) used need to be information-rich, and they often also need to be normalized. Also, they often need to be aligned with the variable we have to predict (which can be more than one, making the problem more complex) whenever applicable. Usually, it's the data engineer who undertakes all these tasks.
For me, the main advantage of the data engineer role is that it's mostly data-driven. The data engineer is a pragmatic fellow and usually relies on facts and evidence, rather than some fancy mathematical model some theorist came up with because he was too bored or incapable to do anything more useful. There is a time and place for mathematical models, but as we now live in a data-driven world, they are more like cute artifacts in this field, soon to be forgotten as the newer generations of data scientists learn the ways of A.I. and true data science. Statistics may linger since there are always hard-core individuals who are too rigid about this sort of thing. Still, the reality of the matter is that nuts and bolts are always going to be there, even when the fancy mathematical models of Stats are buried alongside their most fervent champions.
As for what the future holds for data engineers, well, let's just say that the best is yet to come!
Articles from Zacharias 🐝 Voulgaris
View blogIn a world where information is abundant, being able to process it and do so efficiently is a valuab ...
Strategy is a broad concept involving planning and acting on a plan to tackle an often complex situa ...
I have never been such a big fan of an operating system to try to get others to use it. I like how G ...
Related professionals
You may be interested in these jobs
-
General Machine Operator
3 weeks ago
Randstad USA Chesterfield, United States TEMPORARY, Full timeAs a sewing machine operator, you will be responsible for setting up and operating a sewing machine in accordance with established procedures and guidelines. Duties include sewing various stitches according to customer specifications and making adjustments, or needle changes as n ...
-
night auditor
1 day ago
Shaner Hotel Group Newport, RI, United StatesJob DescriptionBe available to work a flexible schedule, including weekends, holidays, and overnight hours. · Handle guest complaints ensuring guest satisfaction. · Process all check-ins and check-outs according to established hotel requirements. · Adhere to payment, cash handlin ...
-
Multi-Skilled Carpenter
6 days ago
Clark Construction Group Lorton, United States Full timeComplete finishes work including drywall patching and finishing, painting, installing ceiling tiles, hanging doors, and other general carpentry tasks according to sketches, blueprints, or oral instructions. · Responsibilities · Study blueprints and diagrams to determine dimensio ...
Comments
Zacharias 🐝 Voulgaris
2 years ago #2
Your team could be data engineers then since they manage to solve any problems that come their way 🙂
Javier Cámara-Rica 🐝🇪🇸
2 years ago #1
Data engineers are excellent problem solvers