- Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers.
- Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration.
- Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage.
- Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness.
- Prepare training-ready datasets in formats such as TFRecords, and structure data to meet ML research requirements.
- Develop processing components using Dataflow (Beam) and manage large analytical workloads in BigQuery.
- Leverage frameworks like Ray to accelerate large-scale experiments, feature extraction, and research-oriented data preparation.
- Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge.
- Experience working with large, heterogeneous datasets from multiple providers or domains.
- Strong background in entity resolution, deduplication, data unification, or related large-scale data integration techniques.
- Proficiency in Python, with an emphasis on efficient, scalable data processing.
- Experience with BigQuery, Google Dataflow/Apache Beam, or similar batch-processing frameworks.
- Familiarity with data validation, normalization, reconciliation, and building consistent views across diverse data sources.
- Ability to craft well-structured matching and decision strategies that balance accuracy, completeness, and computational efficiency.
- Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery.
- Clear communication skills and the ability to collaborate closely with ML and research teams.
- Knowledge of architecting Google Cloud Platform systems at scale
- Experience with distributed compute frameworks such as Ray, Spark, or Flink.
- Understanding of JAX-based ML pipelines, multihost training setups, or large-scale data preparation for accelerator-backed workflows.
- Familiarity with TFRecords or other high-volume training data formats.
- Exposure to ranking, clustering, or statistical similarity modeling.
- Experience with Go, NextJS, and/or React Native to contribute to full-stack development
- You will design the core dataset that underpins our research, product development, and generative audio models.
- You'll work on large-scale data challenges that require creativity, algorithmic thinking, and engineering excellence.
- You'll join a small, fast-moving team where your decisions shape the direction of our data and research capabilities.
- Highly competitive salary and equity
- Quarterly productivity budget
- Flexible time off
- Fantastic office location in Manhattan
- Productivity package, including ChatGPT Plus, Claude Code, and Copilot
- Top notch private health, dental, and vision insurance for you and your dependents
- 401(k) plan options with employer matching
- Concierge medical/primary care through One Medical and Rightway
- Mental health support from Spring Health
- Personalized life insurance, travel assistance, and many other perks
-
AI Data Ingestion Platform Engineer
3 weeks ago
Only for registered members New YorkWe convert the world's messy data into actionable insights for financial institutions, corporates, and governments—powering workflows like name screening, investigations, and risk research. · ...
- Only for registered members New York
We are looking for a Senior Backend Engineer to lead the unification of large, highly rich, and heterogeneous datasets sourced from a wide range of external providers. · Your work will create the foundational dataset that powers our research by building robust, scalable systems ...
-
Collibra Administrator
1 week ago
Only for registered members New YorkCollibra Administrator responsible for platform administration managing Collibra across DEV / TEST / PROD environments including console access log analysis patching and platform configuration. · ...
-
Platform Engineer
1 week ago
Only for registered members New YorkWe are seeking a skilled Platform Engineer with expertise in Lucidworks Fusion or Elasticsearch to design, deploy, optimize, and maintain our enterprise search and data analytics platforms. · ...
-
Platform Engineer
1 month ago
Only for registered members New YorkWe're seeking a seasoned Platform Engineer / Solution Architect to lead the end to end implementation and adoption of a modern data catalog and metadata platform, such as Atlan or Alation. · This role blends deep technical engineering and pragmatic architecture with product-minde ...
-
Media Asset Management Specialist
1 month ago
Only for registered members New YorkSenior Media Asset Management (MAM) specialist with deep hands-on experience in Avid MAM. · ...
-
Platform Engineer
1 month ago
Only for registered members New YorkWe are seeking a skilled Platform Engineer with expertise in Lucidworks Fusion or Elasticsearch to design, deploy, optimize, and maintain our enterprise search and data analytics platforms. · Key Responsibilities:Search Platform Administration. · ...
-
Data Governance Platform Engineer
1 month ago
Only for registered members New YorkWe're seeking a seasoned Platform Engineer / Solution Architect to lead the end-to-end implementation and adoption of a modern data Catalog and metadata platform, such as Atlan or Alation. · ...
-
Systems Manager
2 weeks ago
Only for registered members New YorkThe Systems Manager leads the strategy, delivery, and operational excellence of Con Edison's enterprise data platform on Google Cloud. This role provides end-to-end ownership of data ingestion, governance, and platform performance to ensure that enterprise data is reliable, secur ...
-
Data Architect
1 week ago
Only for registered members New YorkWe are seeking an experienced Data Architect with expertise in Azure data platform design and implementation. · Ingest data from on-prem systems, cloud platforms, and APIs. · Clean, normalize, and enrich data. · ...
-
Platform Engineer
1 month ago
Only for registered members New YorkWe are seeking a skilled Platform Engineer with expertise in Lucidworks Fusion or Elasticsearch to design, deploy, optimize, · and maintain our enterprise search and data analytics platforms.The ideal candidate will have a strong background in search engine architectures, · data ...
-
Data and Knowledge Engineer
4 weeks ago
Only for registered members New York+h2>Job summary · We are looking for a Data and Knowledge Engineer with strong hands-on experience in data engineering knowledge graphs semantic technologies. · The ideal candidate will design build scalable data pipelines knowledge graph solutions that enable advanced analytics ...
-
Solr/Fusion Platform Engineers
3 weeks ago
Only for registered members New YorkWe are seeking a skilled Platform Engineer with expertise in Elasticsearch and Lucidworks Fusion to design deploy optimize and maintain our enterprise search and data analytics platforms. · The ideal candidate will have a strong background in search engine architectures data inge ...
-
Data Analyst
1 month ago
Only for registered members New YorkThe role focuses on designing log ingestion parsing normalization and enrichment pipelines across multiple AWS services for delivery into a centralized analytics platform. · ...
-
Senior Software Engineer
1 month ago
Only for registered members New YorkThis is an exciting opportunity to join our team as a Senior Software Engineer · You will play a key role in developing and evolving our equities trading systems · Collaborate closely with traders researchers FPGA specialists · to continuously refine execution logic optimize per ...
-
Staff Software Engineer
4 days ago
Only for registered members New YorkWe are hiring a Staff Software Engineer to architect the core intelligence layer behind our sourcing and search engine. · Own the end-to-end architecture of a large-scale search and sourcing platform. · Design and scale distributed systems that process high volumes of structured ...
-
Connectivity Engineer
1 month ago
Only for registered members New YorkGMTC (Global Markets Transaction Capture) platform used to collect ingest normalize publish trading-related datasets across Capital Markets seeks Mid-Level Connectivity Engineer. · Design implement support end-to-end connectivity solutions onboard internal external systems into G ...
-
Media Engineer
1 month ago
Only for registered members New YorkTrident Consulting is looking for a Media Engineer for one of our clients. · ...
-
Data Migration Specialist CRM
4 days ago
Only for registered members New York, NYAssist clients' technical teams in migrating legacy Siebel CRM data into ServiceNow Customer Service Management platform. · ...
-
Data Architecture Consultant
3 weeks ago
Only for registered members New YorkEXL is looking for an experienced Data Architect to design and manage scalable secure and high-performance data solutions. · ...
-
C++ - Research Engineer - Fourier Ltd
1 month ago
Only for registered members New YorkThe role involves improving data interaction for trading decisions by working closely with researchers and engineers to develop scalable data pipelines across on-prem and cloud platforms. The key responsibilities include developing and managing data pipelines from raw data ingest ...
Senior Backend Engineer, Data Modeling and Ingestion Platform - New York - Udio
Description
About the RoleWe are looking for a Senior Backend Engineer to lead the unification of large, highly rich, and heterogeneous datasets sourced from a wide range of external providers. These datasets are used to power our generative audio models.
Your work will create the foundational dataset that powers our research by building robust, scalable systems for linking, deduplicating, reconciling, and enriching data at massive scale. This role centers on high-impact bulk ingestion and advanced data linkage. You will design the logic, algorithms, and strategies that transform many independent datasets into a unified, high-quality canonical asset used throughout the company.
You will collaborate closely with ML researchers and product teams, working with tools such as BigQuery, Dataflow/Beam, TFRecords, and-where beneficial-distributed systems frameworks like Ray. Familiarity with ML workflows using JAX or multihost training is a plus, as the datasets you produce will directly support that ecosystem.
What You'll Do
Udio provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.
-
AI Data Ingestion Platform Engineer
Only for registered members New York
-
Senior Backend Engineer, Data Modeling and Ingestion Platform
Only for registered members New York
-
Collibra Administrator
Only for registered members New York
-
Platform Engineer
Only for registered members New York
-
Platform Engineer
Only for registered members New York
-
Media Asset Management Specialist
Only for registered members New York
-
Platform Engineer
Only for registered members New York
-
Data Governance Platform Engineer
Only for registered members New York
-
Systems Manager
Only for registered members New York
-
Data Architect
Only for registered members New York
-
Platform Engineer
Only for registered members New York
-
Data and Knowledge Engineer
Only for registered members New York
-
Solr/Fusion Platform Engineers
Only for registered members New York
-
Data Analyst
Only for registered members New York
-
Senior Software Engineer
Only for registered members New York
-
Staff Software Engineer
Only for registered members New York
-
Connectivity Engineer
Only for registered members New York
-
Media Engineer
Only for registered members New York
-
Data Migration Specialist CRM
Only for registered members New York, NY
-
Data Architecture Consultant
Only for registered members New York
-
C++ - Research Engineer - Fourier Ltd
Only for registered members New York