No more applications are being accepted for this job
- Provide L2 support to production system like application, database, middleware components, infrastructure and network components
- Manage productions incidents end-to-end within defined SLAs with focus on resolution rather than who caused it.
- Interact with various stake holders such as Release managers, program leads, service managers, development and test leads
- Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components and report the gaps
- Provide pre-implementation support with activities such as release notes review and implementation dry runs.
- Protect production components by running health checks, monitoring latency and memory utilization.
- Automate day-to day activities and propose changes that improve reliability
- Participate in CAB and provide feedback on change requests
- Support the DevOps team in testing the promote pipelines and suggest automation of configuration items.
- Practice incident management best practices and perform RCA.
- Participate in disaster recovery tests and operational acceptance tests
- Analyze the technology stack that makes up the product and optimize recovery time objective.
- Work with team members spread across and time zones
- Share knowledge, document improvements and mentor junior resources
- Deployments MTF/Prod
- Maintenance items (including stop/start, Disaster Recovery-related activities, etc.)
- Monitoring
- Support TRTs
- Incident creation
- CR for changes in MTF/Prod
- Log Monitoring Tool - Splunk
- Application Monitoring tool - Dynatrace
- Ticketing incident/problem management tool - Remedy
- Linux
- SQL
- Dev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory , MC DevOps Tool chain
- Linux & Shell Scripting
- ITIL / ITSM
- PL/SQL
- Application Troubleshooting
- Monitoring Tool - Splunk (preferred), Dynatrace (preferred) or any other monitoring tool
- Jenkins- CI/CD
- Groovy
- Any Cloud - AWS / Azure / PCF
- Git basic/bit bucket
- Even Framework architecture - good to have
- Ansible/Chef
Sr System Reliability Engineer - Saint Louis, United States - Fulcrum Digital
Description
Who are weFulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.
The Role
Requirements
Responsibility Matrix