No more applications are being accepted for this job

Senior Site Reliability Engineer - Des Moines, United States - Workforce Connections

Workforce Connections Des Moines, United States

1 month ago

Description

Job Title

Senior Site Reliability Engineer

Contract Duration

6+ Months with possible contract to hire

Location:
Remote - Must reside in U.S.

Prefer EST or CST time zone

Work Hours

Business Hours

Qualifications/Skills Needed

A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science).
Requires 4 6 years of related experience.

AWS
Route 53
Lamba
Mongo DB
Kafka
Kubernetes
Load Balancing / Load Redirecting / Load Restricting strategies
Rancher, Axway API Gateway
Monitoring and Observability tools such as Prometheus, Grafana, Dynatrace, Splunk, Elk

Common Responsibilities Will Include

Building, reviewing and maintaining application design and architecture documents.

Ensuring the DR capabilities are built into each system.

Working with development teams to implement and maintain the DR capabilities.

Participate in DR testing exercises and evaluate the results for continuous improvement.

Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery(CI/CD) tools, processes, and designs.

Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents.

Understands and advocates for standardized and scalable software tools to ensure that systems operate without interruption at optimum performance and leads project teams through out the deployment process.

Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability.

Essential Functions

Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents

Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility.

Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools

Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization

Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time

Leads lower level Engineers in stress, security, and performance testing

Resolves issues that come up through support escalation

Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise

Leads post incident reviews and documents findings for future informed decision making

Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward.

Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them

CLIENT does not discriminate in employment on the basis of race, color, religion, sex (including pregnancy and gender identity), national origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factor.

#J-18808-Ljbffr

Senior Site Reliability Engineer - Des Moines, United States - Workforce Connections

Description

for Recruiters

Information