Work in company

Senior Software Engineer, Reliability Infrastructure

Only for registered members

· Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver—to improve access to mob ...

Mountain View, CA, US

1 hour ago

Work in company

Senior Software Engineer, Reliability Infrastructure

Only for registered members

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver—to improve access to mobili ...

Mountain View

1 day ago

Work in company Remote job

Lead Infrastructure and Reliability Engineer

Only for registered members

We are still early. The playbook is still being written. A single exceptional engineer can reshape how the company operates. · About Luma AI · A new class of intelligence is emerging, systems that understand and generate the world across video, images, audio, and language. · Buil ...

Palo Alto, CA

3 weeks ago

Work in company

Lead Infrastructure and Reliability Engineer

Only for registered members

We are still early. The playbook is still being written. A single exceptional engineer can reshape how the company operates. · At Luma AI, we operate rapidly scaling 10k+ GPU fleets pushing utilization through-put and reliability hard enough that yesterday's solutions break regul ...

Palo Alto

3 weeks ago

Work in company

Lead Infrastructure and Reliability Engineer

Only for registered members

We are hiring a leader who will define the direction of our Infrastructure Engineering team. As a technical authority, you will be responsible for turning deep systems knowledge into repeatable, scalable reliability for the entire company. · Architect and operate large, heterogen ...

Palo Alto Full time

2 weeks ago

Work in company

Infrastructure Reliability Engineer, Bare Metal

Only for registered members

+We seek a highly skilled and driven Infrastructure Reliability Engineer Bare Metal to join our team and report to our Senior Director Customer Experience. · +Provide expert-level technical support and in-depth troubleshooting for a wide spectrum of hardware associated software i ...

Sunnyvale $122,000 - $163,000 (USD)

1 month ago

Work in company

Cloud Infrastructure – Site Reliability Engineer

Only for registered members

Alibaba Cloud Native Message Middleware Team is responsible for message products, including RocketMQ and other messaging products. We are committed to creating a more stable, user-friendly, streaming, and large-scale messaging platform for the future. · Cloud Product Operations & ...

Sunnyvale $104,400 - $171,000 (USD)

1 week ago

Work in company

Infrastructure Reliability Engineer, Bare Metal

Only for registered members

We seek a highly skilled and driven Infrastructure Reliability Engineer, Bare Metal to join our team and report to our Senior Director, Customer Experience. · Provide expert-level technical support and in-depth troubleshooting for a wide spectrum of hardware and associated softwa ...

Sunnyvale, CA

1 month ago

Work in company

Cloud Infrastructure – Site Reliability Engineer

Only for registered members

We are committed to creating a more stable · streaming messaging platform for the future. · Familiar with high-availability architecture design, · and proficient in at least one of Python, Go, · or Java. · ...

Sunnyvale $104,400 - $171,000 (USD)

1 month ago

Work in company

Cloud Infrastructure – Site Reliability Engineer

Only for registered members

We are committed to creating a more stable streaming messaging platform for future needs. · ...

Sunnyvale, CA

1 month ago

Work in company

Cloud Infrastructure – Site Reliability Engineer

Only for registered members

Alibaba Cloud Native Message Middleware Team is responsible for message products, including RocketMQ and other messaging products. We are committed to creating a more stable, user-friendly, streaming, and large-scale messaging platform for the future. · Cloud Product Operations & ...

Sunnyvale, CA $125,000 - $210,000 (USD) per year

1 week ago

Work in company

Technical Program Manager, ML Developer Experience and Infrastructure Reliability

Only for registered members

· Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver—to improve access to mob ...

Mountain View, CA, USA

2 hours ago

Work in company

Technical Program Manager, ML Developer Experience and Infrastructure Reliability

Only for registered members

We are looking for a Technical Program Manager to lead cross-functional execution and drive the "Golden Path" for ML development in our autonomous driving technology company. · Key responsibilities include managing reliability operations, implementing infrastructure stability pro ...

Mountain View $230,000 - $292,000 (USD) Full time

1 month ago

Work in company

Senior System Architect, Infrastructure Reliability

Only for registered members

NVIDIA is seeking a Senior System Architect: Heterogeneous EDA Systems to solve a complex challenge in accelerated computing: Failure Attribution at Scale. As EDA or equivalent experience workloads scale across thousands of heterogeneous nodes, a single failure can cause massive ...

Santa Clara, CA

1 week ago

Work in company

Senior System Architect, Infrastructure Reliability

Only for registered members

NVIDIA is seeking a Senior System Architect: Heterogeneous EDA Systems to solve a complex challenge in accelerated computing: Failure Attribution at Scale. As EDA or equivalent experience workloads scale across thousands of heterogeneous nodes, a single failure can cause massive ...

US, CA, Santa Clara

2 hours ago

Work in company

Infrastructure Site Reliability Engineer

Only for registered members

Responsibilities · Team Introduction · Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges ...

San Jose, CA

1 week ago

Work in company

Infrastructure Site Reliability Engineer

Only for registered members

Responsibilities · Team Introduction · Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges ...

San Jose $118,657 - $187,200 (USD)

1 week ago

Work in company

Site Reliability Engineer, AI/ML Infrastructure

Only for registered members

We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, · Manage and optimize HPC cluster operations · ...

Santa Clara

1 month ago

Work in company

Site Reliability Engineer, AI/ML Infrastructure

Only for registered members

We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, · Manage and optimize HPC cluster operations · Deploy and maintain infrastructure-as-code solutions · Sup ...

Santa Clara

1 month ago

Work in company

Site Reliability Engineer, Recommendation Infrastructure

Only for registered members

Responsibilities · About the Team · The USDS TikTok Recommendations Infra SRE team works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be respon ...

San Jose $187,040 - $359,720 (USD)

1 week ago

Technical Program Manager, ML Developer Experience and Infrastructure Reliability - Mountain View, CA

Job description

Similar jobs

Senior Software Engineer, Reliability Infrastructure

Senior Software Engineer, Reliability Infrastructure

Lead Infrastructure and Reliability Engineer

Lead Infrastructure and Reliability Engineer

Lead Infrastructure and Reliability Engineer

Infrastructure Reliability Engineer, Bare Metal

Cloud Infrastructure – Site Reliability Engineer

Infrastructure Reliability Engineer, Bare Metal

Cloud Infrastructure – Site Reliability Engineer

Cloud Infrastructure – Site Reliability Engineer

Cloud Infrastructure – Site Reliability Engineer

Technical Program Manager, ML Developer Experience and Infrastructure Reliability

Technical Program Manager, ML Developer Experience and Infrastructure Reliability

Senior System Architect, Infrastructure Reliability

Senior System Architect, Infrastructure Reliability

Infrastructure Site Reliability Engineer

Infrastructure Site Reliability Engineer

Site Reliability Engineer, AI/ML Infrastructure

Site Reliability Engineer, AI/ML Infrastructure

Site Reliability Engineer, Recommendation Infrastructure

Directory

for Recruiters

Information