- Lead the management and monitoring of highly available replicated cloud systems.
- Oversee 24/7 Network Operations Center (NOC) operations to maintain a minimum 99.9% annual uptime.
- Define golden signals for all services in the core SaaS application.
- Manage NOC engineer teams, including scheduling and tasks.
- Design effective escalation policies using PagerDuty.
- Expertise in AWS technologies and creating dashboards with leading observability platforms.
- Automate monitors and dashboards using modern programmatic methods.
- Regularly provide reports to Engineering leadership and executive teams for ongoing improvement.
- B.S. or B.A. in Computer Science.
- Minimum of 5 years of experience in Site Reliability Engineering, with team management experience.
- Strong communication and interpersonal skills for collaboration with Development & Product teams.
- Proficiency in monitoring networking stack using tracing and profiling tools.
- Experience with dashboard tools like NewRelic, Kibana, Grafana, and Prometheus.
- Knowledge of AWS technologies.
- Proficient in monitoring RESTful microservices and HTTP protocols.
- Ability to automate monitors and dashboards using modern methods.
- Experience in managing and automating alerting and escalation tools.
- Troubleshooting experience with container and networking technologies.
- Familiarity with scripting languages like Bash, Python, and Go.
- Load balancer administration and tuning skills.
- Experience dealing with distributed file systems.
- Proficiency in configuration management tools like SaltStack, Ansible, Terraform.
- System Monitoring: Create and automate system monitoring and escalation policies.
- System Management: Respond and resolve internal requests within business hours.
- High Availability & Resilience: Maintain 99.95% uptime and act as the first responder in emergencies.
- Full-Stack Observability: Develop dashboards for end-to-end system anomaly detection.
- Innovation: Regularly propose new ideas and improvements to the team.
-
Site Reliability Engineer
2 weeks ago
Avetta Lehi, United StatesJoin Avetta as a Site Reliability Engineer · Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globall ...
-
Database Reliability Engineer
4 days ago
Adobe Lehi, United StatesOur Company · Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to design and deliver exceptional digital experiences We're passionate about empowering people to create beautiful ...
-
Senior Site Reliability Engineer
3 weeks ago
Collective Health Lehi, United StatesWhat you'll do: · Establish service level indicators and data-driven objectives, and develop SRE standards and processes to uphold and improve uptime, latency, and system health. · Define and execute initiatives to continuously improve our deployed cloud footprint in areas such a ...
-
Sr. Site Reliability Engineer
3 weeks ago
Vivint Lehi, United States Full timeJob Description · Responsibilities · Improve and maintain infrastructure for containerized microservice environments · Troubleshoot and debug issues with a focus on resolving problems quickly with minimal impact to customers and developers · Manage processes, systems, and infr ...
-
Sr. Site Reliability Engineer
2 weeks ago
Vivint Lehi, United StatesJob Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...
-
Staff Site Reliability Engineer
3 weeks ago
Vivint Lehi, United StatesWelcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...
-
Sr. Site Reliability Engineer
3 weeks ago
Vivint Lehi, United StatesWelcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...
-
Staff Site Reliability Engineer
2 weeks ago
Vivint Lehi, United StatesJob Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...
-
Site Reliability Engineer
6 days ago
Goldman Sachs Salt Lake City, United StatesMORE ABOUT THIS JOB: · Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliab ...
-
Site Reliability Engineer
4 weeks ago
Breeze Airways Midvale, United StatesWorking at Breeze Airways is an exciting endeavor and a serious commitment to bring "The World's Nicest Airline" to life. We work cross-functionally with truly awesome Team Members to deliver on our mission: · "To make the world of travel simple, affordable, and convenient. Impr ...
-
Fixed Equipment Reliability Engineer
6 days ago
Big West Oil North Salt Lake, United StatesExperienced Fixed Equipment Reliability engineer to develop and support a developing reliability system. Position is responsible for daily support activities for the refinery asset, as well as developing philosophies, work processes, and special emphasis programs. Responsible for ...
-
Site Reliability Engineer, VP
2 days ago
The Goldman Sachs Group, Inc Salt Lake City, United StatesJob Description · What We Do: · At Goldman Sachs, our Engineers don't just make things - we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering team ...
-
Staff Site Reliability Engineer
4 days ago
Recursion Pharmaceuticals Salt Lake City, United StatesYour work will change lives. Including your own. · Recursion has built a unique dataset (over 1 million unique biological perturbations profiled in a high-dimensional biological assay, making up nearly 20 PB of highly-relatable data) and a unique approach to phenotypic drug disc ...
-
Senior Systems Engineer and Reliability
1 week ago
General Electric Company Salt Lake City, United StatesJob Description Summary · At GE HealthCare, our passionate people are creating the products, solutions and services our customers need to deliver the best patient care possible. · As part of the Engineering Organization, the Reliability Architect is primarily responsible for su ...
-
Senior Systems Engineer and Reliability
3 weeks ago
GE Healthcare Salt Lake City, United StatesJob Description Summary · At GE HealthCare, our passionate people are creating the products, solutions and services our customers need to deliver the best patient care possible. · As part of the Engineering Organization, the Reliability Architect is primarily responsible for sup ...
-
Intermediate Site Reliability Engineer
4 weeks ago
ARCS Salt Lake City, United StatesJoin our client's vibrant team in Cape Town as an Intermediate Site Reliability Engineer (SRE II). Operating mostly remotely, their team occasionally collaborates in the office for direct engagement. Your role involves achieving operational excellence through automation tooling ( ...
-
Battelle Applied Solutions, LLC Salt Lake City, United StatesRequisition Id 11976 · Overview: · Are you looking for a way to use your hard-earned SRE skills in a more ambitious environment where you can also help protect national security? The National Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts ...
-
Manager, Software Engineering
1 week ago
Medifast, Inc Lehi, United States Full timeI. Job Summary · About The Opportunity · At Medifast, our team members are relentless in our mission of driving Lifelong Transformation, One Healthy Habit at a Time. When you join Medifast, you become part of a dynamic, fast-growing community of highly motivated, like-hearted p ...
-
Search Engine Optimization Copywriter
3 days ago
Shipito Lehi, United StatesShipito drives global eCommerce through our fast, reliable, and efficient, package forwarding service. With customers in over 200 countries and territories, Shipito makes it easy for global consumers and businesses to shop at U.S. and international websites, and have those purcha ...
-
Data Analyst II
3 weeks ago
Verisk Lehi, United States Full timeJob Description · As a Data Analyst 2, you will be responsible for interpreting data, analyzing results, and providing ongoing reports to support our business objectives. You will work closely with our development, data science, and business intelligence teams to extract valuable ...
Site Reliability Engineer - Lehi, United States - Avetta
![Default job background](https://contents.bebee.com/public/img/bg-user-ex-1.jpg)
Description
Avetta is looking for a Site Reliability Engineer to join their team!
Site Reliability Engineers play a crucial role in maintaining production systems. Their focus is on continuously optimizing and scaling the system to provide the best possible end-user experience on a global cloud-based SaaS platform. Downtime is not acceptable in the world of Site Reliability Engineering. Being able to manage resilient and distributed systems, implement uptime monitors using APIs, and develop scaling algorithms are key skills for this role. Effective communication with development and product teams to drive technical advancements and prioritize features that enhance uptime and user experience is vital.
Key Responsibilities:
Requirements:
Nice to Have:
Metrics:
Join the Avetta team to drive technical excellence and deliver a seamless user experience globally.