Jobs
>
Lehi

    Site Reliability Engineer - Lehi, United States - Avetta

    Avetta
    Default job background
    Description

    Avetta is looking for a Site Reliability Engineer to join their team!

    Site Reliability Engineers play a crucial role in maintaining production systems. Their focus is on continuously optimizing and scaling the system to provide the best possible end-user experience on a global cloud-based SaaS platform. Downtime is not acceptable in the world of Site Reliability Engineering. Being able to manage resilient and distributed systems, implement uptime monitors using APIs, and develop scaling algorithms are key skills for this role. Effective communication with development and product teams to drive technical advancements and prioritize features that enhance uptime and user experience is vital.

    Key Responsibilities:

    • Lead the management and monitoring of highly available replicated cloud systems.
    • Oversee 24/7 Network Operations Center (NOC) operations to maintain a minimum 99.9% annual uptime.
    • Define golden signals for all services in the core SaaS application.
    • Manage NOC engineer teams, including scheduling and tasks.
    • Design effective escalation policies using PagerDuty.
    • Expertise in AWS technologies and creating dashboards with leading observability platforms.
    • Automate monitors and dashboards using modern programmatic methods.
    • Regularly provide reports to Engineering leadership and executive teams for ongoing improvement.

    Requirements:

    • B.S. or B.A. in Computer Science.
    • Minimum of 5 years of experience in Site Reliability Engineering, with team management experience.
    • Strong communication and interpersonal skills for collaboration with Development & Product teams.
    • Proficiency in monitoring networking stack using tracing and profiling tools.
    • Experience with dashboard tools like NewRelic, Kibana, Grafana, and Prometheus.
    • Knowledge of AWS technologies.
    • Proficient in monitoring RESTful microservices and HTTP protocols.
    • Ability to automate monitors and dashboards using modern methods.
    • Experience in managing and automating alerting and escalation tools.

    Nice to Have:

    • Troubleshooting experience with container and networking technologies.
    • Familiarity with scripting languages like Bash, Python, and Go.
    • Load balancer administration and tuning skills.
    • Experience dealing with distributed file systems.
    • Proficiency in configuration management tools like SaltStack, Ansible, Terraform.

    Metrics:

    • System Monitoring: Create and automate system monitoring and escalation policies.
    • System Management: Respond and resolve internal requests within business hours.
    • High Availability & Resilience: Maintain 99.95% uptime and act as the first responder in emergencies.
    • Full-Stack Observability: Develop dashboards for end-to-end system anomaly detection.
    • Innovation: Regularly propose new ideas and improvements to the team.

    Join the Avetta team to drive technical excellence and deliver a seamless user experience globally.



  • Avetta Lehi, United States

    Join Avetta as a Site Reliability Engineer · Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globall ...


  • Adobe Lehi, United States

    Our Company · Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to design and deliver exceptional digital experiences We're passionate about empowering people to create beautiful ...


  • Collective Health Lehi, United States

    What you'll do: · Establish service level indicators and data-driven objectives, and develop SRE standards and processes to uphold and improve uptime, latency, and system health. · Define and execute initiatives to continuously improve our deployed cloud footprint in areas such a ...


  • Vivint Lehi, United States Full time

    Job Description · Responsibilities · Improve and maintain infrastructure for containerized microservice environments · Troubleshoot and debug issues with a focus on resolving problems quickly with minimal impact to customers and developers · Manage processes, systems, and infr ...


  • Vivint Lehi, United States

    Job Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...


  • Vivint Lehi, United States

    Welcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...


  • Vivint Lehi, United States

    Welcome to the intersection of energy and home services. At NRG, were driven by the idea of a smarter, cleaner, more connected futureand the possibilities that will bring to the world and to the 7.3 million customers we serve. Vivint Smart Home, an Reliability Engineer, Liability ...


  • Vivint Lehi, United States

    Job Description · Welcome to the intersection of energy and home services. At NRG, we're driven by the idea of a smarter, cleaner, more connected future-and the possibilities that will bring to the world and to the 7.3 million customers we serve. · Vivint Smart Home, an NRG-owne ...


  • Goldman Sachs Salt Lake City, United States

    MORE ABOUT THIS JOB: · Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliab ...


  • Breeze Airways Midvale, United States

    Working at Breeze Airways is an exciting endeavor and a serious commitment to bring "The World's Nicest Airline" to life. We work cross-functionally with truly awesome Team Members to deliver on our mission: · "To make the world of travel simple, affordable, and convenient. Impr ...


  • Big West Oil North Salt Lake, United States

    Experienced Fixed Equipment Reliability engineer to develop and support a developing reliability system. Position is responsible for daily support activities for the refinery asset, as well as developing philosophies, work processes, and special emphasis programs. Responsible for ...


  • The Goldman Sachs Group, Inc Salt Lake City, United States

    Job Description · What We Do: · At Goldman Sachs, our Engineers don't just make things - we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering team ...


  • Recursion Pharmaceuticals Salt Lake City, United States

    Your work will change lives. Including your own. · Recursion has built a unique dataset (over 1 million unique biological perturbations profiled in a high-dimensional biological assay, making up nearly 20 PB of highly-relatable data) and a unique approach to phenotypic drug disc ...


  • General Electric Company Salt Lake City, United States

    Job Description Summary · At GE HealthCare, our passionate people are creating the products, solutions and services our customers need to deliver the best patient care possible. · As part of the Engineering Organization, the Reliability Architect is primarily responsible for su ...


  • GE Healthcare Salt Lake City, United States

    Job Description Summary · At GE HealthCare, our passionate people are creating the products, solutions and services our customers need to deliver the best patient care possible. · As part of the Engineering Organization, the Reliability Architect is primarily responsible for sup ...


  • ARCS Salt Lake City, United States

    Join our client's vibrant team in Cape Town as an Intermediate Site Reliability Engineer (SRE II). Operating mostly remotely, their team occasionally collaborates in the office for direct engagement. Your role involves achieving operational excellence through automation tooling ( ...


  • Battelle Applied Solutions, LLC Salt Lake City, United States

    Requisition Id 11976 · Overview: · Are you looking for a way to use your hard-earned SRE skills in a more ambitious environment where you can also help protect national security? The National Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts ...


  • Medifast, Inc Lehi, United States Full time

    I. Job Summary · About The Opportunity · At Medifast, our team members are relentless in our mission of driving Lifelong Transformation, One Healthy Habit at a Time. When you join Medifast, you become part of a dynamic, fast-growing community of highly motivated, like-hearted p ...


  • Shipito Lehi, United States

    Shipito drives global eCommerce through our fast, reliable, and efficient, package forwarding service. With customers in over 200 countries and territories, Shipito makes it easy for global consumers and businesses to shop at U.S. and international websites, and have those purcha ...

  • Verisk

    Data Analyst II

    3 weeks ago


    Verisk Lehi, United States Full time

    Job Description · As a Data Analyst 2, you will be responsible for interpreting data, analyzing results, and providing ongoing reports to support our business objectives. You will work closely with our development, data science, and business intelligence teams to extract valuable ...