Technical Program Manager, ML Developer Experience and Infrastructure Reliability - Mountain View, CA, USA
13 hours ago

Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. · ...
2 months ago
We are still early. The playbook is still being written. A single exceptional engineer can reshape how the company operates. · About Luma AI · A new class of intelligence is emerging, systems that understand and generate the world across video, images, audio, and language. · Buil ...
2 weeks ago
We are hiring a leader who will define the direction of our Infrastructure Engineering team. As a technical authority, you will be responsible for turning deep systems knowledge into repeatable, scalable reliability for the entire company. · Architect and operate large, heterogen ...
2 weeks ago
We are still early. The playbook is still being written. A single exceptional engineer can reshape how the company operates. · At Luma AI, we operate rapidly scaling 10k+ GPU fleets pushing utilization through-put and reliability hard enough that yesterday's solutions break regul ...
2 weeks ago
+We seek a highly skilled and driven Infrastructure Reliability Engineer Bare Metal to join our team and report to our Senior Director Customer Experience. · +Provide expert-level technical support and in-depth troubleshooting for a wide spectrum of hardware associated software i ...
1 month ago
Alibaba Cloud Native Message Middleware Team is responsible for message products, including RocketMQ and other messaging products. We are committed to creating a more stable, user-friendly, streaming, and large-scale messaging platform for the future. · Cloud Product Operations & ...
1 week ago
Alibaba Cloud Native Message Middleware Team is responsible for message products, including RocketMQ and other messaging products. We are committed to creating a more stable, user-friendly, streaming, and large-scale messaging platform for the future. · Cloud Product Operations & ...
6 days ago
We seek a highly skilled and driven Infrastructure Reliability Engineer, Bare Metal to join our team and report to our Senior Director, Customer Experience. · Provide expert-level technical support and in-depth troubleshooting for a wide spectrum of hardware and associated softwa ...
1 month ago
We are committed to creating a more stable · streaming messaging platform for the future. · Familiar with high-availability architecture design, · and proficient in at least one of Python, Go, · or Java. · ...
1 month ago
We are committed to creating a more stable streaming messaging platform for future needs. · ...
1 month ago
Tesla's Supercomputing/AI infrastructure team works directly with the high-performance computing and machine learning infrastructure on which our ML algorithms run; this includes virtual simulations, Autopilot hardware & silicon design. · ...
2 months ago
Technical Program Manager, ML Developer Experience and Infrastructure Reliability
Only for registered members
We are looking for a Technical Program Manager to lead cross-functional execution and drive the "Golden Path" for ML development in our autonomous driving technology company. · Key responsibilities include managing reliability operations, implementing infrastructure stability pro ...
1 month ago
Technical Program Manager, ML Developer Experience and Infrastructure Reliability
Only for registered members
We are looking for a Technical Program Manager to lead cross-functional execution and drive the development of machine learning infrastructure. The ideal candidate will have experience in software engineering, large-scale infrastructure environments, and managing complex technica ...
1 month ago
NVIDIA is seeking a Senior System Architect: Heterogeneous EDA Systems to solve a complex challenge in accelerated computing: Failure Attribution at Scale. As EDA or equivalent experience workloads scale across thousands of heterogeneous nodes, a single failure can cause massive ...
1 week ago
Responsibilities · Team Introduction · Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges ...
5 days ago
Responsibilities · Team Introduction · Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges ...
1 week ago
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, · Manage and optimize HPC cluster operations · ...
1 month ago
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, · Manage and optimize HPC cluster operations · Deploy and maintain infrastructure-as-code solutions · Sup ...
1 month ago
Site Reliability Engineer, Recommendation Infrastructure
Only for registered members
Responsibilities · About the Team · The USDS TikTok Recommendations Infra SRE team works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be respon ...
1 week ago
Site Reliability Engineer, Recommendation Infrastructure
Only for registered members
Responsibilities · About the Team · The USDS TikTok Recommendations Infra SRE team works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be respon ...
5 days ago