Site Reliability Engineer - Atlanta, United States - AppLab Systems Inc

    Default job background
    Description
    Pls dont submit without hands on Prometheus exp. Need Solid exp.

    Location:
    Atlanta / Dallas / Irvine/ NJ (Day 1 onsite)


    Job Description:
    Position
    Dynamic Engineer who has an understanding of application performance management, experience building monitoring and alerting solutions.
    Troubleshoot incidents, identify root cause , fix and document problems and deploy preventative solutions.

    Required Experience
    5+ years of recent experience working on building automation and monitoring for

    observability (Prometheus/Grafana/ELK).
    5 + years of experience working on support projects and be on rotational on-call to address failures.
    5+ years of recent experience with

    Kubernetes, Docker , Helm and end to end support of applications in this environment.
    5+ years of recent experience working in

    AWS and/or GCP.
    3+ years of full stack

    python

    development.
    Great communication skills to be able to effectively communicate with team members as well as management.


    Skills Preferred:
    MLOps experience
    MLE experience
    by Jobble

    #J-18808-Ljbffr