Site Reliability Engineer - San Francisco
2 weeks ago

Job summary
We're looking for an experienced SRE to join our engineering team.You'll be at the intersection of software engineering and systems operations — ensuring our distributed infrastructure is highly available, performant, and scalable while enabling our engineers to move quickly and confidently.
- Design, build, and maintain cloud infrastructure for our distributed build acceleration platform
- Automate everything: from deployment pipelines to monitoring and recovery
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Cisco Silicon One ASICs are transforming the Future of the Internet. · Owning reliability test plans for new products. · Supporting High power Burn In, biased HAST and ESD/LU bring-up and debug for reliability qualification and evaluation. · ...
5 days ago
We're growing incredibly fast and need someone who thrives in a dynamic, high-pressure environment to work side-by-side with the our engineering and leadership team. · Write tests, monitoring, and evaluating the health of our platform. · Design, write, and maintain automated test ...
1 month ago
We are making sure that when businesses build AI agents the experience of doing so doesn't suck.Our team is a group of ex-athletes founders and builders with low egos and a high belief that life not about taking the easy road but challenging ourselves to find the most we can be. ...
5 days ago
We are making sure that when businesses build AI agents the experience of doing so doesn't suck. We're growing fast and need someone who thrives in a dynamic environment to work side-by-side with our engineering team.Your job is to write tests, monitoring, and evaluating the heal ...
1 month ago
+ Reliability expert to maintain and enhance the stability and scalability of our rapidly evolving infrastructure. · + Design and implement solutions to ensure the scalability of our infrastructure. · + Build and maintain load, chaos and synthetic testing software. · Job summary: ...
6 days ago
We're a fully distributed team with employees across North American time zones. · We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, · minimizing downtime, and continuously improving service performance.The Site Rel ...
1 month ago
We are looking for a Senior Site Reliability Engineer (SRE) to build the reliability foundation for a mission-critical healthcare platform. · This is not a "keep the lights on" SRE role. You'll own reliability end-to-end, · define what good looks like: SLIs, SLOs, incident respon ...
2 weeks ago
We are creating a new category of work where expertise powers AI advancement. · Ambitious team that works alongside researchers, operators, · and AI companies shaping systems redefining society.. · ...
1 month ago
We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ...
6 days ago
We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · ...
1 month ago
We are seeking a Site Reliability Engineer (SRE) with strong expertise in Identity and Access Management (IAM) and cloud platforms. · Design and implement IAM/IGA solutions using Okta (OAuth, SAML, OIDC, MFA, FIDO, Zero Trust). · Manage and configure Microsoft Entra ID (Azure AD) ...
1 day ago
About CodeRabbit · CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. · The Role · We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Ba ...
1 week ago
We are building the infrastructure for abundant intelligence at FluidstackWe partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta · , · Fluidstack seeks a Network Engineer to champion and build process reliability metrics f ...
1 week ago
We're looking for engineers who are excited to improve the reliability of complex systems and enjoy digging into how things work. · Bring a generalist mindset and are comfortable working across infrastructure layers—from compute and networking to storage, databases, and app runti ...
1 month ago
We are looking for people who are passionate about a more sustainable future and want to make that vision a reality. · Collaborate with mechanical and electrical engineers during hardware design to establish and refine component and system level reliability targets for our power ...
1 month ago
We are growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. · ...
3 days ago
We are partnering with a technology-driven organization to modernize its infrastructure and operations. · ...
1 day ago
We believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. · Here, ...
1 month ago
Mercor is creating a new category of work where expertise powers AI advancement. · ...
1 week ago
As a Site Reliability Engineer (SRE) at Together AI you are responsible for keeping all user-facing services and production systems running smoothly.You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles operational discipline an ...
1 month ago
Amperesand is disrupting industrial power with the first commercialized Solid State Transformer systems. We are looking for mission driven team members passionate about making amazing products for worldwide electrification at maximum acceleration. · ...
1 month ago