Site Reliability Engineer - Los Angeles, California
6 hours ago

Job description
At BuildOps, we're building a software platform that empowers today's commercial contractors. From service management to project execution, we're reimagining how our customers operate. Our team thrives on ambition, innovation, and collaboration – qualities we look for in every new hire.
You will join our cloud infrastructure and reliability engineering team as a Site Reliability Engineer (SRE). Your primary responsibility will be to improve and protect the reliability, performance, and operability of our production systems while helping evolve our AWS-based infrastructure. We're looking for someone with a strong SRE mindset, solid software engineering fundamentals, and deep observability expertise who can work effectively in a distributed team environment.
Reporting to the DevOps and SRE Manager, this is a hands-on role where you will influence reliability strategy, build tooling and automation, and contribute directly to day-to-day operations in a fast-moving, industry-defining company.
What You'll Do
- Drive and refine modern SRE practices across services, including SLIs/SLOs, error budgets, and reliability reviews
- Design and maintain end-to-end observability (metrics, logs, traces, dashboards, and alerts) so teams can quickly detect, debug, and prevent issues
- Partner with product and engineering teams to design reliable services—reviewing architectures, failure modes, rollout strategies, and capacity/latency considerations
- Help evolve and operate our AWS infrastructure (networking, compute, data stores) using Infrastructure as Code (Terraform)
- Contribute code to services, tooling, and automation (for example, reliability libraries, deployment and incident tooling, health checks)
- Define, implement, and iterate on SLIs, SLOs, and error budgets with service owners, and use them to guide reliability work and release decisions
- Participate in incident response for infrastructure-related production issues, including learning-focused post-incident reviews and follow-through on action items
- Develop runbooks, safeguards, and automation that reduce manual work, improve time-to-diagnosis, and standardize responses to recurring scenarios
- Advocate for and implement security and compliance best practices in production environments
- Document standards, playbooks, and best practices so reliability improvements scale across teams
- Collaborate closely with software engineers, product managers, and other stakeholders to plan and deliver reliability-focused initiatives
What We Look For
- 3+ years of professional experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, working on production systems and reliability-focused initiatives
- Thorough understanding of and hands-on experience with modern SRE practices, such as:
Defining and implementing SLIs/SLOs and error budgets
Reducing toil through automation
Safe deployment and rollout patterns
Structured post-incident reviews and continuous improvement
- Some software engineering experience required: you've written and maintained production-quality code and can work comfortably in at least one modern language (for example, Python or )
- Interested in using LLMs to assist in work, with at least some experience doing so
Strong observability skills:
- Designing metrics, logging, and tracing for multi-service systems
Building actionable dashboards and alerts with clear runbooks
Correlating metrics, logs, and traces to debug complex issues
- Experience with tools such as Datadog, Prometheus, Grafana, Honeycomb, or New Relic (we use Datadog, but vendor-agnostic experience is welcome)
- Experience working with AWS in production and with core platform primitives such as Terraform-based Infrastructure as Code and container/orchestration platforms (for example, Docker with ECS, EKS, or Kubernetes)
Incident management experience is a strong plus, including:
- Participating in or coordinating incident response
Working within an incident management tool (for example, , PagerDuty, OpsGenie, or similar)
Helping teams implement durable, high-leverage follow-ups
- Strong communication skills and the ability to explain complex technical topics to both technical and non-technical audiences
- CS degree or equivalent experience running production systems; we are equally interested in people from non-traditional backgrounds who have spent time operating real-world environments
- Ability to work a hybrid schedule – Monday/Friday WFH; Tuesday–Thursday in-office
Compesnation
- $120,000 - $150,000 base salary range + annual bonus
- Generous equity grant, become an owner in our company
- Macbook computer provided
- A comprehensive benefits package
- Flexible PTO and hybrid work schedules
- Work from home stipend
- Hubs in Los Angeles, San Francisco, Toronto, and Raleigh with hybrid work schedules and lunch provided for in-office days
- Company events like BBQs and team-building activities, both in-person and virtual
- Fast-paced, collaborative, and dynamic work environment
- Opportunities for growth and career advancement
- Chance to work with cutting-edge technology and innovative solutions
- The chance to get in on the ground floor and build something truly groundbreaking for ourselves and our amazing customers
We welcome applicants from across the U.S. where we are registered to do business and able to support employment. Currently, this excludes the following states: Alabama, Alaska, Connecticut, Hawaii, Kentucky, Mississippi, Nebraska, New Mexico, North Dakota, Rhode Island, South Dakota, West Virginia, and Wyoming. This list is based solely on operational and compliance considerations and is reviewed from time to time as our footprint grows.
About BuildOps
Join BuildOps, the largest commercial trade platform in the country, as we transform the multi-billion dollar commercial contracting industry
We're not just talking incremental improvements—we're talking a full-scale revolution, empowering the hardworking heroes who build and maintain the infrastructure that keeps our world running. See why contractors choose Buildops here.
This is your chance to be part of a rocketship. We're fresh off a $1 billion valuation and a $127M Series C funding round (part of over $275M raised to date) led by industry-leading investors like Meritech Capital, BOND, and SE Ventures, backed by Schneider Electric (Reuters, TechCrunch, LA Business Journal) . Our latest investors join our team of industry heavyweights like Next47, former Twitter CEO Dick Costolo, former Salesforce President Gavin Patterson, and Boost Mobile CEO Stephen Stokols. Their investment is fueling our aggressive growth and our commitment to equipping contractors with AI-driven tools to conquer chaos, boost efficiency, skyrocket profitability, and ultimately, deliver exceptional service.
At BuildOps, we're changing the game and doing the best work of our careers. You'll be a key player in a company that's truly making a difference for the backbone of our economy. If you're ready to tackle big challenges, work with a passionate team, and build something extraordinary, BuildOps is the place for you.
Similar jobs
We're hiring a Reliability Engineer to lead and mature the reliability strategy at one of East Tennessee's fastest-growing plastics manufacturers. · Equipment Reliability Strategy – Guide reliability and maintainability initiatives across new and existing equipment. · Maintenance ...
15 hours ago
Be a part of Reliability Growth Team dedicated to supporting the NGHST program and achieve contractual reliability targets for rolling stock. · ...
2 days ago
Zachary Piper Solutions seeks an experienced Site Reliability Engineer to support the deployment and sustainment of systems across classified environments. · ...
1 month ago
+We are looking for a Senior Reliability Engineer to develop and manage the plant equipment maintenance strategy. The ideal candidate will have 10+ years of engineering experience and 5+ years of maintenance or reliability engineering experience. · +Monitor equipment data using T ...
2 days ago
We're building a software platform that empowers today's commercial contractors. From service management to project execution, we're reimagining how our customers operate. · ...
1 day ago
We anticipate the application window for this opening will close on -16 Jan 2026 At Medtronic you can begin a life-long career of exploration and innovation while helping champion healthcare access and equity for allYou'll lead with purpose breaking down barriers to innovation in ...
1 month ago
+Job Summary · As a Senior Reliability Engineer in Medtronic's Diabetes business, you will lead product verification and reliability test planning/designing/testing methods/equipment for infusion pump systems.+ · +ResponsibilitiesNegotiates within the business to improve overall ...
2 days ago
+Job summary · Support the deployment and sustainment of systems across classified environments · + · Deploy and maintain software in air-gapped and customer-owned cloud or on-prem environments · ,liauthenticate infrastructure configurations in AWS C2E and other classified enviro ...
6 days ago
An exciting career awaits you · At MPC, we're committed to being a great place to work – one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment. · Instrument Reliability Engineer · Job Summary · The Marathon ...
5 hours ago
We are on a mission to solve global water scarcity by creating innovative desalination systems. · ...
1 month ago
We are looking for a talented Site Reliability Engineer to join our growing team. In this role, be part of the core development team that builds front end components our customers use to interface with our SAP integration framework. Are you an innovative, creative and a driven se ...
15 hours ago
We are seeking an Ops-heavy, SecOps-focused Site Reliability Engineer to deploy and support Quindar systems in classified environments. · This role requires hands-on work inside customer facilities managing deployments into existing classified cloud infrastructure (AWS preferred) ...
1 month ago
+ We're looking for a scrappy, hands-on Test & Reliability Engineer to own testing from prototype to production. · ...
2 weeks ago
About Vital Lyfe · Vital Lyfe is a tech company redefining water autonomy through innovation — creating a new category of personal water-making technology built to scale where infrastructure can't. · Mission · We're looking for a scrappy, hands-on Test & Reliability Engineer to o ...
6 hours ago
We anticipate the application window for this opening will close on - 23 Feb 2026. · At Medtronic you can begin a life-long career of exploration and innovation while helping champion healthcare access and equity for all. · ...
1 day ago
We're looking for a passionate and experienced Site Reliability Engineer to join our team and play a crucial role in ensuring our cloud platform's security, · Reliability, · scales well.Assist in implementing and operating Microservices on Kubernetes cloud-based platforms. · Coll ...
15 hours ago
We are seeking a Senior Reliability Engineer to join our team at Medtronic's Diabetes business unit. As a senior reliability engineer you will lead product verification and reliability test planning/designing developing implementing testing methods equipment for new product devel ...
1 month ago
An exciting career awaits you · At MPC, we're committed to being a great place to work – one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment. · Instrument Reliability Engineer · Job Summary · The Marathon ...
7 hours ago
+Northwood is building a global network of next-generation ground stations, and we're looking for a Fleet Reliability Engineer who is equal parts technical expert, field operator, and builder. · +Upgrade, troubleshoot, and maintain a growing network of antennas distributed across ...
1 day ago
We are seeking a Product Reliability Engineer to join our growing team. · Lead initiatives to improve process efficiency and reduce defects by analyzing quality and production data to identify trends, root causes, and improvement opportunities. · ...
3 weeks ago