Site Reliability Engineering - Atlanta
1 day ago

Job description
About the JobThis role focuses on building and operating highly reliable infrastructure and automation supporting
physical security systems. Infra Automation Engineer applies principles such as SLIs, SLOs, error budgets,
and toil reduction to improve system resilience and operational efficiency. The role works closely with
Google leadership to deliver secure, scalable, and automated infrastructure.
Key Responsibilities
- Lead, mentor, and manage a team of Automation Engineers, fostering a culture of ownership,
- Partner with client IT leadership to define, implement, and track Service Level Indicators (SLIs)
- Manage error budgets to maintain optimal balance between feature development velocity and
- Act as the primary escalation point for severe incidents (Sev 1/2) and ensure effective incident
- Facilitate blameless post-mortem analysis for all major incidents and drive systemic
- Manage the team's project backlog, prioritize work, and ensure balance between reliability
- Drive automation strategies to reduce manual operational tasks with measurable targets (e.g., 50%
- Oversee Infrastructure-as-Code implementations using Ansible, Terraform, Puppet, or Chef for
- Ensure robust observability through standardized monitoring, alerting, and centralized logging
- Manage 24x5 on-call rotations and ensure adequate team coverage for incident response and
- Collaborate with cross-functional stakeholders to gather requirements, define project scope, and
- Drive Mean Time To Repair (MTTR) reduction through automation, improved runbooks, and
Required Skills & Experience
- Experience: 8+ years in Site Reliability Engineering, or Infrastructure Engineering with
- Technical Proficiency: Strong hands-on experience with Linux/Windows server administration,
Bash, PowerShell)
- SRE Practices: Deep understanding of principles including SLIs, SLOs, error budgets, toil
Datadog)
- Incident Management: Proven track record in leading incident response, conducting blameless
environments
- Networking & Security: Solid understanding of networking fundamentals, Cisco device
security compliance frameworks
- Leadership & Communication: Excellent communication and stakeholder management skills
fast-paced environments
Similar jobs
The role of a Reliability Engineer is vital to the organization's operations and success as they are responsible for ensuring the reliability and maintainability of equipment and systems. · ...
6 days ago
Reliability engineer responsible for leading global technical initiatives across customer fulfillment to improve reliability and quality at scale. · ...
1 month ago
MHE Reliability Engineer, MHE Reliability Engineering Team
Only for registered members
We are looking for a Material Handling Equipment (MHE) MHE Reliability Engineer to join our team. · As a primary point of contact for internal customers and vendors, · provides structured maintenance strategies for material handling equipment (MHE), · with a focus on slat and cro ...
1 month ago
+Job summary · JLL is seeking a Senior Reliability Engineer to join our team. · +ResponsibilitiesThis exciting opportunity is responsible for providing reliability engineering support for operations and maintenance of buildings, infrastructure, and equipment assets. · In coordina ...
1 month ago
We're working with Cox Automotive on this exciting opportunity.Join a groundbreaking Reliability Engineering team at Cox Automotive, where you'll build intelligent automation platforms that prevent incidents, leverage AI for rapid response, and empower engineers to create more re ...
2 weeks ago
This Corporate Reliability Engineer at Arclin will be responsible for developing and implementing strategies to enhance the reliability and performance of our manufacturing processes and systems. · Analyzing data · ...
3 weeks ago
A highly skilled Site Reliability Engineer (SRE) with strong background in monitoring and alerting systems is sought after to reduce onscreen monitoring by implementing proactive alerting mechanisms. · ...
1 week ago
JLL empowers you to shape a brighter way by combining world-class services, advisory and technology for our clients. · ...
1 month ago
We are seeking a highly experienced Site Reliability Engineering (SRE) Architect to lead the strategic design, development, · and maturity of our reliability engineering practices. · Architect scalable, highly available, · secure, and cost-effective solutions on AWS. · ...
1 month ago
The Wormhole Foundation is seeking an experienced Site Reliability Engineer (SRE) to improve the reliability, security, and operational excellence of Wormhole's production infrastructure. · Act as first responder and incident commander during production incidentsLead incident tri ...
1 month ago
We are seeking an experienced SRE / DevOps Engineer with strong expertise in monitoring, cloud infrastructure, incident management, and application architecture. · Monitor and analyze system performance using tools like Splunk and Grafana · Lead incident troubleshooting and manag ...
3 weeks ago
We're looking for a Site Reliability Engineer (SRE) with the right zeal to contribute to systems engineering, software development and passion for quality to envision, design, · and deliver Office 365 (O365) Enterprise Cloud service offerings. · We are in the early stages of inte ...
1 month ago
Be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google's direct-to-consumer products. · ...
2 weeks ago
As an SRE Architect, you will be a pivotal technical leader responsible for designing, building and evolving the foundational systems and practices that ensure the reliability scalability performance efficiency of our critical services. · ...
1 month ago
+Job summary · A senior reliability engineer position is available at JLL in Atlanta, GA. · +ResponsibilitiesProvide reliability engineering support for operations and maintenance of buildings, infrastructure, and equipment assets. · ...
1 month ago
We are looking for a highly skilled and experienced Site Reliability Engineer (SRE) with strong background in monitoring and alerting systems, · particularly using Splunk. The primary focus of this role will be to reduce onscreen monitoring, ensuring only actionable alerts · are ...
2 weeks ago
Bright Vision Technologies busca un ingeniero de sitio confiable para unirse a su equipo dinámico y contribuir en la misión de transformar procesos comerciales mediante tecnología. · ...
1 week ago
Be a foundational member of a new Reliability Engineering team building intelligent automation platforms that revolutionize incident prevention and response. · ...
1 week ago
We are seeking a skilled Site Reliability Engineer III to join our Platform Engineering team. · Ensure that microservices and cloud infrastructure are consistently reliable. · Work closely with cross-functional teams to provide deep insights into system health. · ...
1 week ago
We're looking for a Site Reliability Engineer with the right zeal to contribute to systems engineering, · software development and passion for quality to envision, design and deliver Office 365 (O365) Enterprise Cloud service offerings. · Leverages technical expertise... · ...
1 month ago
As a Site Reliability Engineer at Now100, you will be responsible for building and supporting the platform/application infrastructure of one of the largest retailers in the world. You will maintain high site uptime/availability while embracing rapid change and growth using a stro ...
1 week ago