Zacharias 🐝 Voulgaris

2 years ago · 2 min. reading time · ~100 ·

Blogging
>
Zacharias 🐝 blog
>
In Praise of the Data Engineer

In Praise of the Data Engineer

RDih4.png
Source: LibreOffice Draw & pixabay.com 

This article is partly inspired by Ken Boddie's article about the various types of engineers and partly from one of the great quotes in Fay's latest article.

 

The data engineer is the unsung hero of data science. He (that role is usually maned by males) is the one who does all the low-level work, including extraction, transferring, and loading (ETL) the data, making it ready for the more refined processes of the data science pipeline. He is the plumber of that pipeline, making sure that the data gets from A to B and dealing with all the crap that it innately has. Well, nothing smelly or disgusting but nothing the data scientist dealing with modeling would want to deal with either!

Also, the data engineer often liaises with one of the most relevant professionals in the data world: the data architect (or the database administrator in some cases). This role is a more established and a more prestigious one in many cases since everyone (even the less data-savvy individuals) understands the need for an architect. It could also be better branding since architects have a good rep in general when it comes to buildings, even if it's the civil engineers who make these designs come to life, while they also shoulder most of the responsibility for these structures. Still, data architects are essential and most data scientists have no clue how important their role is. Data engineers, however, deal with them directly as they often need to access the databases the data architects have designed and often maintain.

The data engineer is also someone who has to come up with hacks to make the data come to life. Even if the data is well-governed and doesn't have too many issues, someone has to prepare the data so that it's ready for the data models data scientists use. The latter requires a more or less structured dataset, often in the form of a matrix or a data frame. Even though matrices are ubiquitous in Math, they aren't what you expect them to be in a data science setting. It's not enough to group a bunch of arrays into a rectangle of values and give it to the model. The arrays (vectors) used need to be information-rich, and they often also need to be normalized. Also, they often need to be aligned with the variable we have to predict (which can be more than one, making the problem more complex) whenever applicable. Usually, it's the data engineer who undertakes all these tasks.

For me, the main advantage of the data engineer role is that it's mostly data-driven. The data engineer is a pragmatic fellow and usually relies on facts and evidence, rather than some fancy mathematical model some theorist came up with because he was too bored or incapable to do anything more useful. There is a time and place for mathematical models, but as we now live in a data-driven world, they are more like cute artifacts in this field, soon to be forgotten as the newer generations of data scientists learn the ways of A.I. and true data science. Statistics may linger since there are always hard-core individuals who are too rigid about this sort of thing. Still, the reality of the matter is that nuts and bolts are always going to be there, even when the fancy mathematical models of Stats are buried alongside their most fervent champions.

As for what the future holds for data engineers, well, let's just say that the best is yet to come!

Comments

Zacharias 🐝 Voulgaris

2 years ago #2

 Data engineers are excellent problem solvers

Articles from Zacharias 🐝 Voulgaris

View blog
1 year ago · 2 min. reading time

In a world where information is abundant, being able to process it and do so efficiently is a valuab ...

1 year ago · 4 min. reading time

Strategy is a broad concept involving planning and acting on a plan to tackle an often complex situa ...

2 years ago · 4 min. reading time

I have never been such a big fan of an operating system to try to get others to use it. I like how G ...

Related professionals

You may be interested in these jobs


  • Randstad USA Chesterfield, United States TEMPORARY, Full time

    As a sewing machine operator, you will be responsible for setting up and operating a sewing machine in accordance with established procedures and guidelines. Duties include sewing various stitches according to customer specifications and making adjustments, or needle changes as n ...

  • Shaner Hotel Group

    night auditor

    1 day ago


    Shaner Hotel Group Newport, RI, United States

    Job DescriptionBe available to work a flexible schedule, including weekends, holidays, and overnight hours. · Handle guest complaints ensuring guest satisfaction. · Process all check-ins and check-outs according to established hotel requirements. · Adhere to payment, cash handlin ...


  • Clark Construction Group Lorton, United States Full time

    Complete finishes work including drywall patching and finishing, painting, installing ceiling tiles, hanging doors, and other general carpentry tasks according to sketches, blueprints, or oral instructions. · Responsibilities · Study blueprints and diagrams to determine dimensio ...