- Lead the management and monitoring of highly available replicated cloud systems.
- Oversee 24/7 Network Operations Center (NOC) operations, maintaining a minimum 99.9% annual uptime.
- Define golden signals for all services in our core SaaS application.
- Manage NOC engineer teams, including scheduling and responsibilities.
- Design PagerDuty escalation policies across various teams.
- Expertise in AWS technologies and building dashboards with leading observability platforms.
- Automate monitors and dashboards using modern programmatic methods.
- Provide regular reports to Engineering leadership and executive teams for continuous improvement.
- Minimum B.S. or B.A. in Computer Science.
- Stellar communication and interpersonal skills for effective collaboration with Development & Product teams.
- Proficiency in monitoring the networking stack using distributed tracing and profiling tools.
- Working knowledge of profiling tools for measuring CPU, Memory, I/O, Disk, and process threads dumps.
- Experience in managing, integrating, and automating alerting and escalation tools.
- Working knowledge in monitoring RESTful microservices and basic HTTP protocols.
- Must be able to work a HYBRID WORK SCHEDULE (3 days in office, 2 days work from home) and come into Avetta's Lehi Office located at 3300 N. Triumph Blvd., Lehi, UT 84043.
- Troubleshooting experience with modern container and networking technologies (Kubernetes, HAProxy, ALB).
- Familiarity with scripting languages like Bash, Python, and Go.
- Ability to administer and tune load balancer technologies.
- Experience in managing, monitoring, and benchmarking distributed file systems.
- Proficiency in configuration management tools (SaltStack, Ansible, Terraform).
- System Monitoring: Create and automate system monitor and escalation policies.
- System Management: Respond and resolve internal requests within business hours.
- High Availability & Resilience: Maintain 99.95% uptime and be the first responder in emergency situations.
- Full-Stack Observability: Build dashboards for end-to-end detection of system anomalies.
- Innovation: Propose new ideas and improvements to the team regularly.
-
Site Reliability Engineer
5 days ago
Avetta Lehi, United StatesJoin Avetta as a Site Reliability Engineer · Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globall ...
-
Site Reliability Engineer
1 week ago
Avetta Lehi, United StatesJoin Avetta as a Site Reliability Engineer · Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globall ...
-
Site Reliability Engineer
1 week ago
Avetta Lehi, United StatesJoin Avetta as a Site Reliability Engineer · Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globall ...
-
Staff Site Reliability Engineer
1 week ago
Vivint Lehi, United StatesWelcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...
-
Sr. Site Reliability Engineer
1 week ago
Vivint Lehi, United StatesWelcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...
-
Sr. Site Reliability Engineer
5 days ago
Vivint Lehi, United StatesJob Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...
-
Staff Site Reliability Engineer
5 days ago
Vivint Lehi, United StatesJob Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...
-
Senior Site Reliability Engineer
1 week ago
Collective Health Lehi, United StatesWhat you'll do: · Establish service level indicators and data-driven objectives, and develop SRE standards and processes to uphold and improve uptime, latency, and system health. · Define and execute initiatives to continuously improve our deployed cloud footprint in areas such a ...
-
Staff Site Reliability Engineer
3 weeks ago
Vivint Lehi, United StatesJob Description Our mission is to redefine the home experience through intelligently designed products and services delivered to every home by people who care. Who Are We Vivint Smart Home is a leading smart home company in North America. Vivint delivers an integrated smart home ...
-
Sr. Site Reliability Engineer
1 week ago
Vivint Lehi, United States Full timeJob Description · Responsibilities · Improve and maintain infrastructure for containerized microservice environments · Troubleshoot and debug issues with a focus on resolving problems quickly with minimal impact to customers and developers · Manage processes, systems, and infr ...
-
Site Reliability Engineer
2 weeks ago
Breeze Airways Midvale, United StatesWorking at Breeze Airways is an exciting endeavor and a serious commitment to bring "The World's Nicest Airline" to life. We work cross-functionally with truly awesome Team Members to deliver on our mission: · "To make the world of travel simple, affordable, and convenient. Impr ...
-
Site Reliability Engineer
3 weeks ago
Breeze Airways Midvale, United StatesWorking at Breeze Airways is an exciting endeavor and a serious commitment to bring "The World's Nicest Airline" to life. We work cross-functionally with truly awesome Team Members to deliver on our mission: "To make the world of travel simple, affordable, and convenient. Improvi ...
-
Site Reliability Engineer
4 weeks ago
Global Channel Management Salt Lake City, United StatesSite Reliability Engineer (SRE) Specialist · 1 month ago · Be among the first 25 applicants · 4+ years of experience in a similar support engineering or site reliability engineering (SRE) role · Proficiency in standard RPE with excellent written and verbal communication skills ...
-
Site Reliability Engineer
3 weeks ago
Goldman Sachs Salt Lake City, United StatesMORE ABOUT THIS JOB: · Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliab ...
-
Site Reliability Engineer
3 weeks ago
Mountain America Credit Union Salt Lake City, United StatesSite Reliability Engineer page is loaded · Site Reliability Engineer · Apply · locations · Sandy, UT · time type · Full time · posted on · Posted 3 Days Ago · job requisition id · R15225 · Please reference the schedule and minimum qualifications listed below before app ...
-
Site Reliability Engineer
2 weeks ago
Technology Search Group, Inc. Salt Lake City, United StatesAbout the job Site Reliability Engineer (SRE) · Responsibilities · Responsible for collaborating with businesspeople to have a real time understanding of business problems and expected to focus on agile methodology of development. Deliver high quality change within the deadlines ...
-
Site Reliability Engineer
3 weeks ago
Diverse Lynx Salt Lake City, United StatesRole: Site Reliability Engineer · Type: Full time perm · Location: Salt Lake City, Utah · Annual Salary: Market Standard · Responsibilities · " Opportunity to drive modern Observability platform that covers Cloud-native and hybrid applications · " Able to persuade stakehold ...
-
Site Reliability Engineer
3 weeks ago
TSG Salt Lake City, United StatesResponsibilities · Responsible for collaborating with businesspeople to have a real time understanding of business problems and expected to focus on agile methodology of development. Deliver high quality change within the deadlines. · In this role, you will be responsible for c ...
-
Fixed Equipment Reliability Engineer
3 weeks ago
Big West Oil North Salt Lake, United StatesExperienced Fixed Equipment Reliability engineer to develop and support a developing reliability system. Position is responsible for daily support activities for the refinery asset, as well as developing philosophies, work processes, and special emphasis programs. Responsible for ...
-
Fixed Equipment Reliability Engineer
2 weeks ago
Big West Oil Salt Lake City, United StatesExperienced Fixed Equipment Reliability engineer to develop and support a developing reliability system. Position is responsible for daily support activities for the refinery asset, as well as developing philosophies, work processes, and special emphasis programs. Responsible for ...
Site Reliability Engineer - Lehi, United States - Avetta
Description
Join Avetta as a Site Reliability Engineer
Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globally distributed cloud-based SaaS platform. Downtime is not within the SRE's vocabulary. The ability to maintain highly resilient and distributed systems, while integrating uptime monitors using programmatic APIs and developing intelligent scaling algorithms are important skills for the SRE. In addition, the SRE needs to be able to communicate effectively with both development and product teams to drive technical discovery and help prioritize features that maintain and exceed uptime goals and end-user experience.
Essential Duties and Responsibilities:
Minimum Qualifications:
Nice to Haves:
Metrics That Matter:
Join us at Avetta and be at the forefront of driving technical excellence and ensuring a seamless experience for our users across the globe.
#LI-HYBRID