No more applications are being accepted for this job

System Reliability Engineer - St Louis, United States - Fulcrum Digital Inc

Fulcrum Digital Inc St Louis, United States

1 month ago

Description

Who are we

Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.

Is this the next step in your career Find out if you are the right candidate by reading through the complete overview below.

The Role

Provide L2 support to production systems like applications, databases, middleware components, infrastructure, and network components
Manage production incidents end-to-end within defined SLAs focusing on resolution rather than who caused it.
Interact with various stakeholders such as Release managers, program leads, service managers, development and test leads
Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components, and report the gaps
Provide pre-implementation support with activities such as release notes review and implementation dry runs.
Protect production components by running health checks, and monitoring latency and memory utilization.
Automate day-to-day activities and propose changes that improve reliability
Participate in CAB and provide feedback on change requests
Support the DevOps team in testing the promoted pipelines and suggest automation of configuration items.
Practice incident management best practices and perform RCA.
Participate in disaster recovery tests and operational acceptance tests
Analyze the technology stack that makes up the product and optimize the recovery time objective.
Work with team members spread across time zones
Share knowledge, document improvements, and mentor junior resources

Requirements

Deployments MTF/Prod
Maintenance items (including stop/start, Disaster Recovery-related activities, etc.)
Monitoring
Support TRTs
Incident creation
CR for changes in MTF/Prod

Skills

Linux & Shell Scripting
ITIL / ITSM
PL/SQL
SQL
Application Troubleshooting
Ticketing incident/problem management tool - Remedy
Monitoring Tool - Splunk (preferred), Dynatrace (preferred), or any other monitoring tool
Jenkins- CI/CD - good to have
Groovy - good to have
Any Cloud - AWS / Azure / PCF - good to have
Git basic/bit bucket - good to have
Even Framework architecture - good to have
Ansible/Chef – good to have
Dev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory, MC DevOps Toolchain

System Reliability Engineer - St Louis, United States - Fulcrum Digital Inc

Description

for Recruiters

Information