Site Reliability Engineer, ML Compute SRE - Durham
2 days ago

Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. · Experience driving progress, solving problems, and mentoring more junior team members; · Bachelor's degree in Computer Scien ...
21 hours ago
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, · massively distributed, fault-tolerant systems. · ...
1 day ago
+Job summary · Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. · +Design/develop new features to help us support ML operations across Technical Infrastructure (TI) and Google ...
2 days ago
Senior Software Engineer, Storage, Site Reliability Engineering
Only for registered members
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. · Design, plan, and execute on software engineering projects that help products operate efficiently and reliably inside of Goo ...
1 month ago
Senior Software Engineer, Storage, Site Reliability Engineering
Only for registered members
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. · ...
1 month ago
Senior Software Engineer, Storage, Site Reliability Engineering
Only for registered members
Durham, NC, USA; Raleigh, NC, USA. · Design software engineering projects that help products operate efficiently and reliably inside of Google's data center. · ...
1 month ago
We are looking for a Systems Thinking, SRE Leader who has helped teams scale through production insights, operational automation, developer guidance, real-time metrics, · automation, automation, automation. · Bachelor's degree or higher in a technology related field required. · 1 ...
3 weeks ago
This position will be managing & leading SREs & Production Support Engineers. The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. · ...
4 weeks ago
+Job summary · Toshiba Global Commerce Solutions is seeking a Manager to serve as the platform owner for TGCS's enterprise data lakehouse and AI/analytics environment on Oracle Cloud Infrastructure. · +ResponsibilitiesDefine and own the AIDP platform roadmap service catalog archi ...
1 week ago
The System Support Engineer will support application support, cloud enablement and provide critical production support.You will have the opportunity to demonstrate all your skills in engineering modernizing and testing applications. · ...
2 weeks ago
Toshiba Global Commerce Solutions seeks a Manager for its AI Data Platform (AIDP) on Oracle Cloud Infrastructure. This role oversees the platform's strategy, governance, operations and cost management to ensure secure and scalable data services supporting analytics and enterprise ...
1 week ago
Aspida is seeking a Site Reliability & Infrastructure Automation Engineer to support and modernize our environment across both traditional infrastructure and modern cloud/SRE practices. · ...
1 month ago
This is a production engineering leadership role focused on distributed system correctness, · resilience, and performance engineering.You will partner closely with existing SRE · and Operations leaders, but your charter is to engineer prevention by building · production readiness ...
2 weeks ago
This is a production engineering leadership role focused on distributed system correctness, resilience, and performance engineering. As AI accelerates development velocity, the bottleneck shifts from writing code to verifying correctness, performance, and safe behavior under fail ...
1 week ago
We're seeking a Site Reliability & Infrastructure Automation Engineer to support and modernize our environment across both traditional infrastructure and modern cloud/SRE practices. · ...
4 days ago
Aspida is seeking a Site Reliability & Infrastructure Automation Engineer to support and modernize our environment across both traditional infrastructure and modern cloud/SRE practices. · Design and maintain synthetic monitoring for critical applications, services, and APIs. · Bu ...
1 week ago
This is a production engineering leadership role focused on distributed system correctness,resilience,and performance engineering. · Bachelor's degree in Computer Science,Engineering, · or equivalent practical experience. · ...
2 weeks ago
The application window is expected to close on: 02/15/2026 · Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. · This role is required to be hybrid in Raleigh North Carolina Research Triangle Park Days onsite in t ...
1 week ago
Senior Backend Software Engineer, OpenShift Managed Services
Only for registered members
We're looking for a Senior Backend Software Engineer to join the ROSA Fleet Manager team. You will develop new features and maintain our Red Hat OpenShift Container Platform customer services. · Play an active part in developing various projects around OpenShift Customer services ...
2 weeks ago
Fidelity's Site Reliability Engineering group combines operations excellence with development experience to deliver services at high scale and availability using automation and infrastructure code. · ...
3 weeks ago