Zacharias 🐝 Voulgaris

2 years ago · 2 min. reading time · ~100 ·

Blogging

>

Zacharias 🐝 blog

>

A Modern Data Pipeline

Source: Semantix Brasil

I generally don't opt for fancy animations and such, but sometimes this is the only way to convey a process' complexity and sophistication. In this case, it's the data science process, often referred to as a data pipeline (it's not the only one, while this particular one is just one of many potential implementations of this concept). Although it may seem overwhelming, this is the day-to-day work of a data scientist or a data science team. Let's delve into it.

For starters, we have a collection of data sources, depicted as the circles on the left. More often than not, these are databases, SQL or otherwise. However, they can be anything containing data, depending on the application at hand. If, for example, the data pipeline of a particular process involves gathering data from sensors, as in an IoT system, it's usually some form of a file transferred via the internet. In other scenarios, it can be a web application, some API, or some computer program. The data science process is quite flexible in that regard.

Once the data source is configured, often through the invaluable help of a data engineer, its content becomes available via a data loader or a data acquisition process. Often, this data is combined with archived data, usually in the form of a data lake. This latter data storage location is well within the domain of data architects or data modelers, professionals who work closely with data scientists, and who are responsible for organizing the data and securing it in the most appropriate location). Note that this can also be on the cloud or an organization’s private data center. During this phase of the pipeline, often referred to as data engineering, the data is extracted, transformed, and loaded (ETL) into the right places. Usually, a lot of data exploration takes place to understand the data better.

Following that, the data can be shared with other teams (e.g., developers who wish to use it as inputs for their applications, or some other database), it can be visualized (so that the management has an idea of what it can do with it), and it can be utilized through machine learning models. The latter are usually predictive and, more often than not, involve some form of A.I. in them.

Whenever the latter option is leveraged, data scientists are involved more, though they are often utilized for data visualization too, depending on the team. However, the latter is a task that can also be done by a data analyst, or some data visualization specialist.

Beyond this simple diagram, there is plenty more that's related to the data models built. However, this can get quite specialized, and it's better suited for a technical book or video. Note that even though it's not explicitly mentioned, throughout this process certain Cybersecurity protocols come into play. This involvement of Cybersecurity processes is especially the case when the data is in transit or stored in a location that other people can access, for example, in a database. So, even if it’s tacit, the presence of encryption and PII-protecting processes is there. Fortunately, this is often handled by specialized professionals, though in smaller organizations, a data scientist may need to deal with this too.

So, next time someone tells you that a data scientist is just a Stats professional who also knows some programming or a programmer who knows some Stats, do what I do and roll your eyes in contempt!

If you enjoy this sort of article, where I explore technical topics from a level that's easier to comprehend, abstaining from too much jargon and Math, you'd definitely like my blog, foxydatascience.com. There I explore various topics related to data science, A.I., and Cybersecurity. Check it out when you have a moment. Cheers!

#Stats #Semantix Brasil #Modern Data Pipeline #SQL #IoT

in Data Science, Data Analytics, and Data Professionals in General and in 1 more group

Comments

Zacharias 🐝 Voulgaris

2 years ago #2

Jerry Fletcher

2 years ago #1

Zacharias, Nice view from 30,000 feet. Completely understandable.

Articles from Zacharias 🐝 Voulgaris

View blog

1 year ago · 1 min. reading time

Utilizing a Renewable Power Generator (USB-based)

Whether it's a solar panel or a rigged hamster wheel, you can make a first step in harnessing your p ...

1 year ago · 3 min. reading time

Mentoring and Consulting - What's the difference?

Overview · Lately, many professionals in the data world offer mentor and consult services. Oftentime ...

1 year ago · 4 min. reading time

Strategy and Heuristics’ Role in It

Strategy is a broad concept involving planning and acting on a plan to tackle an often complex situa ...

Related professionals

Mohammed Abdul Jawad

Employee in a pharmaceutical manufacturing company based in Saudi Arabia. · I am a lively & enthusiastic individual who applies vision to helping people and get along with the work by being aware of p ...

Pharmaceutical / Bio-tech

Jeddah - Makkah Province

Lada 🏡 Prkic

Chartered Civil Engineer

Engineering / Architecture

Split - Splitsko-Dalmatinska Županija

Jerry Fletcher

Consultant Messaging Master, Professional Speaker

Marketing / Advertising / Public Relations

Portland - Oregon

You may be interested in these jobs

Non-Invasive Cardiology

Found in: beBee S2 US - 4 weeks ago

Jackson Physician Search New York, United States Full time

Join growing 50-year-old practice affiliated with a top nationally ranked hospital. The position offers excellent quality of life and excellent income potential in the 95th percentile. · The practice · • Full spectrum non-invasive services with predominantly outpatient clinical ...
Specimen Processor

Found in: Lensa US P 2 C2 - 9 hours ago

PRIDE Health Kings Mountain, United States

Pride Health is hiring a Specimen Tech to support our clients medical facility based in Kings Mountain NC This is a 3 month with the possibility of a contract-to-hire opportunity and a great way to start working with a top-tier healthcare organization · Title : Specimen Tech · ...
Site Safety Coordinator

Found in: Appcast Linkedin GBL C2 - 2 days ago

Mullins Mechanical Jackson, United States

About You · Are you a skilled safety specialist with industrial construction site experience? Do you have excellent awareness and advisory skills? If this sounds like you, then you should mull over a career with Mullins Mechanical. · We are looking for a Site Safety Coordinator ...

Zacharias 🐝 Voulgaris

A Modern Data Pipeline

Comments

Zacharias 🐝 Voulgaris

Jerry Fletcher

Articles from Zacharias 🐝 Voulgaris

Utilizing a Renewable Power Generator (USB-based)

Mentoring and Consulting - What's the difference?

Strategy and Heuristics’ Role in It

Related professionals

Mohammed Abdul Jawad

Lada 🏡 Prkic

Jerry Fletcher

You may be interested in these jobs

Non-Invasive Cardiology

Specimen Processor

Site Safety Coordinator

for Recruiters

Information