SRE Lead/ L3 Support - Boston, United States - eTeam

    eTeam
    eTeam Boston, United States

    1 month ago

    eTeam background
    Description
    Java Development, JenkinsExperience implementing logging, monitoring, and alerting solutions.
    Experience in service level observabilityExperience in any scripting language such as bash, python, Groovy, GoLang.' Must Have- 8+ years of experience on operations for production systems.
    • Implement scalable solutions for applications, infra and database
    • Responsible for the maintenance, configuration, and reliable operations for VM, network servers and applications
    • Identify, install and configure the tools for managing, monitoring and reporting performance of servers and applications (on prem, cloud, hybrid)
    • Perform HW and SW upgrades
    • Troubleshoot hardware and software problems/incidents by working with application team members, by running diagnostics, and assessing impact of issues
    • Perform capacity planning for HW and SW
    • Experience implementing logging, monitoring, and alerting solutions
    • Experience in service level observability
    • Experience in performance improvement solution across Apps and Infra
    • Understand various metrics such as MTTR, MTBR, MTD, etc.
    along with understanding of SLO / SLI / Error Budgets
    • Knowledge in Chaos Engineering tools and practices
    • Implement scalable solutions for applications, database
    • Good experience with scripting language such as bash, python, Groovy, GoLang
    • Implementation experience in SRE Tools and Accelerators
    • Understanding of Infra As Code practices and ability to create / modify runbooks and SOP's
    • Good understanding of container and container platformGood to Have
    • At least 3+ years of application architecture and development knowledge with Java or .
    net technologies for API, Database, Microservices, Integration layer, Mobile
    • Implementation experience on CI/CD tools and pipelines
    • Knowledge in Chaos Engineering tools and practices
    • Good understanding of ITSM processes, basic metrics for support quality and reliability
    • Internal / External SRE Certification'.
    #J-18808-Ljbffr