No more applications are being accepted for this job

Sr System Reliability Engineer - Saint Louis, United States - Fulcrum Digital

Fulcrum Digital Saint Louis, United States

1 week ago

Description

Who are we

Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.

The Role

Provide L2 support to production system like application, database, middleware components, infrastructure and network components
Manage productions incidents end-to-end within defined SLAs with focus on resolution rather than who caused it.
Interact with various stake holders such as Release managers, program leads, service managers, development and test leads
Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components and report the gaps
Provide pre-implementation support with activities such as release notes review and implementation dry runs.
Protect production components by running health checks, monitoring latency and memory utilization.
Automate day-to day activities and propose changes that improve reliability
Participate in CAB and provide feedback on change requests
Support the DevOps team in testing the promote pipelines and suggest automation of configuration items.
Practice incident management best practices and perform RCA.
Participate in disaster recovery tests and operational acceptance tests
Analyze the technology stack that makes up the product and optimize recovery time objective.
Work with team members spread across and time zones
Share knowledge, document improvements and mentor junior resources

Requirements

Responsibility Matrix

Deployments MTF/Prod
Maintenance items (including stop/start, Disaster Recovery-related activities, etc.)
Monitoring
Support TRTs
Incident creation
CR for changes in MTF/Prod

Tools

Log Monitoring Tool - Splunk
Application Monitoring tool - Dynatrace
Ticketing incident/problem management tool - Remedy
Linux
SQL
Dev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory , MC DevOps Tool chain

Skills -

Linux & Shell Scripting

ITIL / ITSM

PL/SQL

Application Troubleshooting

Monitoring Tool - Splunk (preferred), Dynatrace (preferred) or any other monitoring tool

Jenkins- CI/CD

Groovy

Any Cloud - AWS / Azure / PCF

Git basic/bit bucket

Even Framework architecture - good to have

Ansible/Chef

Sr System Reliability Engineer - Saint Louis, United States - Fulcrum Digital

Description

Requirements

for Recruiters

Information