Principal DevOps Engineer - Salt Lake City, United States - NICE

    Default job background
    Description


    The CXone Expert product is a multi-tenant SaaS platform, designed to handle millions of requests with high performance and reliability.

    Each Expert site can easily host a complex hierarchy of tens of thousands of pages (articles), with layers of fine-grained permissioning, server- and client-side customizations and branding, and other complex business logic.

    Our enterprise customers have a global presence, and delivering their content with low latency across the globe with near-zero downtime is what they expect.

    CXone Expert is an agile engineering organization, and QA is fully automated. We release new versions of our platform every week through our CI/CD pipeline. Our application infrastructure runs on AWS and is almost entirely containerized and orchestrated by Kubernetes.

    We are looking for a Principal DevOps Engineer to round out our Site Reliability / DevOps team. This person will be the go-to person for research and development of architectural changes from the infrastructure up.

    Another important part of this role is helping other engineers on the team design and implement software that scales well and is highly reliable.

    You will get your hands dirty and refactor existing system / application code yourself (this is a hands-on role).

    Responsibilities

    Analyze system reliability and performance to address and prevent issues.


    Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

    Maintain services once they are live by measuring and monitoring availability, latency and overall system health.


    Scale systems sustainably through mechanisms like automation and evolve systems by making code and configuration changes that improve reliability and velocity.

    Participate in on-call rotation for service disruptions

    Identify and diagnose infrastructure issues in a live production environment

    Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.

    Practice sustainable incident response, blameless postmortems, and root cause analysis.

    Defining and developing continuous integration and deployment pipelines

    Building Infrastructure as Code

    Coordinating build and release activities with other stakeholders

    Training and mentoring other DevOps engineers

    Working with teams to develop code quality metrics and meters

    Identifying, researching, and prototyping new technologies to improve DevOps processes

    Troubleshooting & responding to downtime, performance degradation and outside attacks

    Prepare documentation and diagrams for informational and compliance purposes

    Requirements

    BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.

    8+ years experience designing, analyzing and troubleshooting large-scale distributed systems

    Sustained track record of creating major improvements in large business-critical systems around stability, security, performance, and scalability.

    Experience in one or more of the following: Java, Python, C#, or JavaScript.

    Excellent communication, analytical, and troubleshooting skills

    Ability to work independently, as well as part of a team, on multiple competing projects

    Ability to debug, profile, and optimize code and automate routine tasks.

    Can effectively facilitate cross-team work and are influential far beyond his or her individual group.

    Strong sense of ownership.

    Life-long learner able to quickly grow new frameworks, architectures, and languages

    Desired Skills

    Experience running production systems on AWS

    A deep understanding of REST and network programming

    Experience scaling high-traffic SaaS applications

    Deep knowledge of Kubernetes

    Experience with Application Monitoring Metrics (AWS X-Ray, Cloudwatch, Datadog, etc)

    #J-18808-Ljbffr