Caroline Smith

6 days ago · 2 min. reading time · visibility ~10 ·

chat Contact the author

thumb_up Relevant message Comment

Here’s Why Machine Learning Data Catalogs are Becoming Popular


Here’s why Machine Learning Data Catalogs (MLDCs) are becoming popular. In part one of this blog post we had discussed what data catalogs are, and why there is an increase in their use by enterprises over the last two years. In this second and final part of that post, we look at how artificial intelligence (AI), specifically machine learning (ML), has led to the birth of a new type of data catalog called machine learning data catalogs or MLDCs.

But before going there, a quick recap: data catalogs are meant to override manual data tagging of your data by using automatic labeling. Catalogs are essential to telling you where your data is stored, thus reducing the time taken to identify data and make it accessible for analytics. It’s an inventory of your organization’s data assets. The manual system of data cataloging is no longer viable in the data-intensive world we stay in today, where, with each passing day, thousands of petabytes of data are generated.

MLDCs take the science of data cataloging even further. They help with two things: (a) tracking data lineage (b) analyze how data is being consumed internally. The first is very important for addressing data governance requirements, especially after the passage of online privacy protection laws such as the GDPR.

MLDCs are of vital help to those enterprises operating in the Internet of Things (IoT) world where the flow of streaming data is largely in real-time and analytics is often from the device edge. In this kind of scenario, time is a critical factor, so MLDCs help agile organizations which do not have the time to go through standard profiling processes.

In addition, enterprises can use AI capabilities to uncover metadata from current and new data sets, then tag them as per the business rules to help locate and use documentation as and when needed.

In one of its reports on the early adoption of MLDCs commissioned by Waterline Data, Forrester has said that AI-run data catalogs were overcoming the most difficult challenges in this field.

What can Machine Learning Data Catalogs achieve? The positives

Forrester’s survey showed that nearly half of MLDC adopters had attained, or hoped to achieve, benefits such as:

  1. Better control over data management and data governance;
  2. Improved understanding of data utilization and behavior for data security and support;
  3. A better understanding of the data to drive insights and actions;
  4. The ability to automate a significant number of developmental, administrative, and governance tasks

Here’s why Machine Learning Data Catalogs MLDCs are selling like hot potatoes

Market demand for MLDCs is growing because:

  1. there’s been an explosion in the volume and variety of data that can no longer be tagged using typical data catalogs
  2. of regulatory demands around data privacy
  3. of the availability of cheaper technology that’s leading to the democratization of data
thumb_up Relevant message Comment

More articles from Caroline Smith

View blog
1 week ago · 3 min. reading time
Caroline Smith

What is Cohort Analysis? And How it Helps to Retain Customers

Cohort AnalysisWhat is cohort analysis? How does c ...

2 weeks ago · 4 min. reading time
Caroline Smith

Getting Actionable Insights From Customer Data Platform

A customer data platform (CDP) is a software syste ...