Site Reliability Engineer - Cleveland, United States - M3USA

    Default job background
    Description
    Job Description

    Want to apply Read all the information about this position below, then hit the apply button.

    Design and implement improvements to NAS' system infrastructure, to meet performance, availability, resilience, security, and compliance objectives.

    Monitor and improve system performance, identifying potential enhancements and troubleshooting issues as necessary.

    Collaborate with application developers to reduce and mitigate errors and improve quality of service for users and customers.

    Develop automated alerting and response systems to manage reliability risks.

    Deploy and maintain cloud infrastructure, particularly on Microsoft Azure, using Infrastructure-as-Code and automated scripts whenever possible.

    Work alongside developers to ensure that systems are reliable and performant.

    Lead scalability and reliability enhancement projects.

    Document system architecture and maintenance procedures.

    Create runbooks for common fault scenarios and lead incident postmortems.

    Monitor critical third-party services and aid in the selection of new services as needed.

    Proactively work to improve cost efficiency while meeting service level objectives.

    Write scripts and integrate services to automate repetitive work and reduce toil.

    Remote working/work at home options are available for this role.