Zacharias 🐝 Voulgaris

2 years ago · 2 min. reading time · ~100 ·

Blogging

>

Zacharias 🐝 blog

>

In Praise of the Data Engineer

Source: LibreOffice Draw & pixabay.com

This article is partly inspired by Ken Boddie's article about the various types of engineers and partly from one of the great quotes in Fay's latest article.

The data engineer is the unsung hero of data science. He (that role is usually maned by males) is the one who does all the low-level work, including extraction, transferring, and loading (ETL) the data, making it ready for the more refined processes of the data science pipeline. He is the plumber of that pipeline, making sure that the data gets from A to B and dealing with all the crap that it innately has. Well, nothing smelly or disgusting but nothing the data scientist dealing with modeling would want to deal with either!

Also, the data engineer often liaises with one of the most relevant professionals in the data world: the data architect (or the database administrator in some cases). This role is a more established and a more prestigious one in many cases since everyone (even the less data-savvy individuals) understands the need for an architect. It could also be better branding since architects have a good rep in general when it comes to buildings, even if it's the civil engineers who make these designs come to life, while they also shoulder most of the responsibility for these structures. Still, data architects are essential and most data scientists have no clue how important their role is. Data engineers, however, deal with them directly as they often need to access the databases the data architects have designed and often maintain.

The data engineer is also someone who has to come up with hacks to make the data come to life. Even if the data is well-governed and doesn't have too many issues, someone has to prepare the data so that it's ready for the data models data scientists use. The latter requires a more or less structured dataset, often in the form of a matrix or a data frame. Even though matrices are ubiquitous in Math, they aren't what you expect them to be in a data science setting. It's not enough to group a bunch of arrays into a rectangle of values and give it to the model. The arrays (vectors) used need to be information-rich, and they often also need to be normalized. Also, they often need to be aligned with the variable we have to predict (which can be more than one, making the problem more complex) whenever applicable. Usually, it's the data engineer who undertakes all these tasks.

For me, the main advantage of the data engineer role is that it's mostly data-driven. The data engineer is a pragmatic fellow and usually relies on facts and evidence, rather than some fancy mathematical model some theorist came up with because he was too bored or incapable to do anything more useful. There is a time and place for mathematical models, but as we now live in a data-driven world, they are more like cute artifacts in this field, soon to be forgotten as the newer generations of data scientists learn the ways of A.I. and true data science. Statistics may linger since there are always hard-core individuals who are too rigid about this sort of thing. Still, the reality of the matter is that nuts and bolts are always going to be there, even when the fancy mathematical models of Stats are buried alongside their most fervent champions.

As for what the future holds for data engineers, well, let's just say that the best is yet to come!

#Ken Boddie #Math #Data Engineer #LibreOffice Draw #Fay

in Data Science, Data Analytics, and Data Professionals in General

Comments

Zacharias 🐝 Voulgaris

2 years ago #2

Javier Cámara-Rica 🐝🇪🇸

2 years ago #1

Data engineers are excellent problem solvers

Articles from Zacharias 🐝 Voulgaris

View blog

1 year ago · 2 min. reading time

Related professionals

Fay Vietmeier

Trusted Advisory & Consultant energy procurement

Utilities / Energy

Pittsburgh - Pennsylvania

€500 hour

Javier Cámara-Rica 🐝🇪🇸

CEO & Co-founder at beBee

Technology / Internet

(4)

Madrid, Madrid

Marketing Strategy + 10

You may be interested in these jobs

General Machine Operator

3 weeks ago

Randstad USA Chesterfield, United States TEMPORARY, Full time

As a sewing machine operator, you will be responsible for setting up and operating a sewing machine in accordance with established procedures and guidelines. Duties include sewing various stitches according to customer specifications and making adjustments, or needle changes as n ...
night auditor

1 day ago

Shaner Hotel Group Newport, RI, United States

Job DescriptionBe available to work a flexible schedule, including weekends, holidays, and overnight hours. · Handle guest complaints ensuring guest satisfaction. · Process all check-ins and check-outs according to established hotel requirements. · Adhere to payment, cash handlin ...
Multi-Skilled Carpenter

6 days ago

Clark Construction Group Lorton, United States Full time

Complete finishes work including drywall patching and finishing, painting, installing ceiling tiles, hanging doors, and other general carpentry tasks according to sketches, blueprints, or oral instructions. · Responsibilities · Study blueprints and diagrams to determine dimensio ...

Zacharias 🐝 Voulgaris

In Praise of the Data Engineer

Comments

Zacharias 🐝 Voulgaris

Javier Cámara-Rica 🐝🇪🇸

Data engineers are excellent problem solvers

Articles from Zacharias 🐝 Voulgaris

Top 5 Benefits of Developing Your Data IQ

Strategy and Heuristics’ Role in It

Tackling Some Common Myths about Linux

Related professionals

Fay Vietmeier

Javier Cámara-Rica 🐝🇪🇸

You may be interested in these jobs

General Machine Operator

night auditor

Multi-Skilled Carpenter

for Recruiters

Information