Senior Site Reliability Engineer - Des Moines, United States - Workforce Connections

    Default job background
    Description
    Job Title

    Senior Site Reliability Engineer

    Contract Duration

    6+ Months with possible contract to hire


    Location:
    Remote - Must reside in U.S.

    Prefer EST or CST time zone

    Work Hours

    Business Hours

    Qualifications/Skills Needed

    A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science).
    Requires 4 6 years of related experience.

    AWS
    Route 53
    Lamba
    Mongo DB
    Kafka
    Kubernetes
    Load Balancing / Load Redirecting / Load Restricting strategies
    Rancher, Axway API Gateway
    Monitoring and Observability tools such as Prometheus, Grafana, Dynatrace, Splunk, Elk

    Common Responsibilities Will Include

    Building, reviewing and maintaining application design and architecture documents.

    Ensuring the DR capabilities are built into each system.

    Working with development teams to implement and maintain the DR capabilities.

    Participate in DR testing exercises and evaluate the results for continuous improvement.


    Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery(CI/CD) tools, processes, and designs.


    Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents.


    Understands and advocates for standardized and scalable software tools to ensure that systems operate without interruption at optimum performance and leads project teams through out the deployment process.

    Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability.

    Essential Functions

    Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents


    Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility.

    Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools

    Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization

    Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time

    Leads lower level Engineers in stress, security, and performance testing

    Resolves issues that come up through support escalation

    Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise

    Leads post incident reviews and documents findings for future informed decision making


    Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward.

    Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them


    CLIENT does not discriminate in employment on the basis of race, color, religion, sex (including pregnancy and gender identity), national origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factor.

    #J-18808-Ljbffr