No more applications are being accepted for this job
System Reliability Engineer - St Louis, United States - Fulcrum Digital Inc
Description
Who are weFulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation.
These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.
The RoleProvide L2 support to production systems like applications, databases, middleware components, infrastructure, and network components
Manage production incidents end-to-end within defined SLAs focusing on resolution rather than who caused it.
Interact with various stakeholders such as Release managers, program leads, service managers, development and test leads
Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components, and report the gaps
Provide pre-implementation support with activities such as release notes review and implementation dry runs.
Protect production components by running health checks, and monitoring latency and memory utilization.
Automate day-to-day activities and propose changes that improve reliability
Participate in CAB and provide feedback on change requests
Support the DevOps team in testing the promoted pipelines and suggest automation of configuration items.
Practice incident management best practices and perform RCA.
Participate in disaster recovery tests and operational acceptance tests
Analyze the technology stack that makes up the product and optimize the recovery time objective.
Work with team members spread across time zones
Share knowledge, document improvements, and mentor junior resources
Requirements
Deployments MTF/Prod
Maintenance items (including stop/start, Disaster Recovery-related activities, etc.)
Monitoring
Support TRTs
Incident creation
CR for changes in MTF/Prod
Skills
Linux & Shell Scripting
ITIL / ITSM
PL/
SQL
SQL
Application Troubleshooting
Ticketing incident/problem management tool - Remedy
Monitoring Tool - Splunk (preferred), Dynatrace (preferred), or any other monitoring tool
Jenkins- CI/CD - good to have
Groovy - good to have
Any Cloud - AWS / Azure / PCF - good to have
Git basic/bit bucket - good to have
Even Framework architecture - good to have
Ansible/Chef – good to have
Dev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory, MC DevOps Toolchain
#J-18808-Ljbffr