Software Engineer, Site Reliability - San Mateo
2 weeks ago

Job summary
We're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry.
+Responsibilities
- Ensure systems are designed and implemented with high availability, scalability, and performance.
- 'Incident Management & Response:' Lead efforts in incident detection, response, and resolution for critical production issues.
- 'Observability & Monitoring:' Develop, implement, and maintain comprehensive monitoring, alerting logging tracing solutions to provide deep insights into system health and performance.'Automation &' Toil Reduction: Identify automate repetitive operational tasks to reduce toil improve operational efficiency'Capacity Planning &' Performance Tuning Work proactively on capacity planning ensure our infrastructure can gracefully handle growth peak loads'Optimize system performance resource utilization'Reliability Best Practices Collaborate software engineers embed reliability principles into development lifecycle promoting a culture operational excellence'
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Site Reliability Engineer
1 month ago
We are actively looking for a talented Site Reliability Engineer to join the Infrastructure team.As a member of the infrastructure team, your role will be to manage this infrastructure and continue to make it easier for our team to monitor and scale it. · ...
Site Reliability Engineer
3 weeks ago
We are actively looking for a talented Site Reliability Engineer to join the Infrastructure team. · To manage this infrastructure and continue to make it easier for our team to monitor and scale it, · be it by adopting 3rd party tools or design your own.,Keep our infrastructure u ...
Site Reliability Engineer
5 days ago
+Job summary · Verkada is transforming how organizations protect their people and places with an integrated, AI-powered platform. A leader in cloud physical security, Verkada helps organizations strengthen safety and efficiency through one connected software platform that include ...
Site Reliability Engineering
1 week ago
We are seeking a Site Reliability Engineer to join our team in San Mateo, CA. The ideal candidate will have 6+ years of experience in an SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resliency. · Serve as a men ...
Site Reliability Engineer
4 weeks ago
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. · ...
Site Reliability Engineer
4 weeks ago
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. · ...
Site Reliability Engineer
4 weeks ago
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. · ...
Site Reliability Engineer
1 week ago
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. · ...
Site Reliability Engineer
1 month ago
We need our team to be representative of the world. · Join our Site Reliability Engineering team. · ...
Senior Site Reliability Engineer
1 month ago
We're looking for a Senior Site Reliability Engineer to join our SRE team, the group responsible for keeping our systems fast, reliable and secure.This is more than just keeping the lights on. · You'll be engineering the future of a platform trusted by developers and companies ar ...
Software Engineer, Site Reliability
2 weeks ago
As a Site Reliability Engineer at Fireworks AI, you will play a critical role in making our world-scale virtual AI cloud reliable, performant, and efficient. · Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and perfor ...
Senior Site Reliability Engineer
5 days ago
IXL Learning is seeking a Senior Site Reliability Engineer to join our team and help maintain the reliability and optimal performance of our products. We are seeking engineers with a passion for problem solving and optimization. · ...
Senior Site Reliability Engineer, Compute
2 weeks ago
At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. · ...
Site Reliability Engineer
1 month ago
We believe consumers and businesses can coexist. Our platform allows consumers to access savings tools, earned wages and rewards without cost or hidden fees. · Write Terraform modules for deploying infrastructure resources via our GitLab pipelines · Develop Helm charts for deploy ...
Software Engineer, Site Reliability Engineering
1 month ago
Backend/infrastructure engineer to join our founding team.Working on building ML and data pipelines. · Owning the observability and deployments of Wisdom stack · ...
SRE ensures that Google's services have reliability, uptime appropriate to users' needs and a fast rate of improvement. SRE's will keep an ever-watchful eye on our systems capacity and performance. · ...
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. · ...
Senior Site Reliability Engineer,
1 week ago
· You won't just be 'managing' systems; you will be the architect of their health ... . · ...
Site Reliability Engineer Intern
1 month ago
You will be joining the OCSC (Oracle Cloud Service Centre) as an SRD (site reliability developer). Your job role will be helping Oracle ensure the availability of cloud services 24x7x365. · As a Cloud Service Centre Site Reliability Developer Intern you will be involved with: · A ...
Site Reliability Engineer Intern
1 month ago
You will be joining the OCSC (Oracle Cloud Service Centre) as an SRD (site reliability developer). Your job role will be helping Oracle ensure the availability of cloud services 24x7x365. · The Oracle Cloud Service Centre monitors and responds to Service Events that are impacting ...