System Reliability Engineer - St Louis, United States - Fulcrum Digital Inc

    Default job background
    Description
    Who are we

    Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation.

    These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.

    The Role
    Provide L2 support to production systems like applications, databases, middleware components, infrastructure, and network components
    Manage production incidents end-to-end within defined SLAs focusing on resolution rather than who caused it.
    Interact with various stakeholders such as Release managers, program leads, service managers, development and test leads
    Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components, and report the gaps
    Provide pre-implementation support with activities such as release notes review and implementation dry runs.
    Protect production components by running health checks, and monitoring latency and memory utilization.
    Automate day-to-day activities and propose changes that improve reliability
    Participate in CAB and provide feedback on change requests
    Support the DevOps team in testing the promoted pipelines and suggest automation of configuration items.
    Practice incident management best practices and perform RCA.
    Participate in disaster recovery tests and operational acceptance tests
    Analyze the technology stack that makes up the product and optimize the recovery time objective.
    Work with team members spread across time zones
    Share knowledge, document improvements, and mentor junior resources

    Requirements
    Deployments MTF/Prod
    Maintenance items (including stop/start, Disaster Recovery-related activities, etc.)
    Monitoring
    Support TRTs
    Incident creation
    CR for changes in MTF/Prod

    Skills
    Linux & Shell Scripting
    ITIL / ITSM
    PL/
    SQL

    SQL
    Application Troubleshooting
    Ticketing incident/problem management tool - Remedy
    Monitoring Tool - Splunk (preferred), Dynatrace (preferred), or any other monitoring tool
    Jenkins- CI/CD - good to have
    Groovy - good to have
    Any Cloud - AWS / Azure / PCF - good to have
    Git basic/bit bucket - good to have
    Even Framework architecture - good to have
    Ansible/Chef – good to have
    Dev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory, MC DevOps Toolchain

    #J-18808-Ljbffr