Reliability Engineering Coach - Hartford, United States - The Hartford

    Default job background
    Description
    Staff Reliability Engineer


    Want to make an application Make sure your CV is up to date, then read the following job specs carefully before applying.


    • IE07KEWe're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies.
    Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too.

    Join our team as we help shape the future.

    The Central Reliability and Automation team is looking for a driven and highly motivated Staff Reliability Engineer Coach to join the team.


    In this role you will have responsibility for designing and maintaining a given IT solution for CI/CD pipeline, observability suite (monitoring/alerting/logging tools/processes) and automation suite consumed by REs, and Software Engineers.


    The Site Reliability Engineer will work with the consumers and stakeholder of the solution to define functional and non-functional requirements for the service.


    Leveraging Open Source or Commercial of the Shelf (COTS) products, they will design, build and maintain the solution, meet current and future demand.

    They will apply key SRE tenets across the life-cycle of the solution.

    A prerequisite to the role will be a "build-to-manage", problem-solving and innovative mindset applied to the design, build, test, deploy, change and maintenance of services drawing from deep engineering expertise.


    Key measures of success will include service stability, effective delivery and environment instrumentation, deployment quality, technical debt reduction, asset resiliency, risk/security compliance, cost efficiency, proactive and preventative maintenance mechanisms, top quartile operating norms.

    The Senior Site Reliability Engineer will actively contribute to sustained advancement of the SRE practice within and beyond a given area of responsibility

    Responsibilities:

    Influence and design architecture, infrastructure, standards and methods for large-scale cloud systemsEngage in and improve the software development life-cycle through CI/CD; Improve build to deployment process to establish greater reliability and a sustainable release process; Oversee release gating; establish deployment metrics (DORA)Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgetsObservability and custom monitoring tool integrations; introduce telemetry to support SLOsAutomate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliabilityDeploy software through highly available practices, rolling, blue-green or canaryProvide mentorship to reliability engineering squads under a consistent framework for the Development, Testing and Alerting processesPractice sustainable incident response through blameless RCA and postmortemsAdvise performance testing and capacity planningCommunicate proactively with colleagues and formally present work product outcomes and risk analysis to product team and management.


    Follow the Agile/Scrum working methodologies Establish dashboarding for monitoring capabilities and metricsQualifications: 7+ years of experience in related field3-+ years of experience in languages such as Python, Ruby, Bash, PerlBS degree in Engineering, Computer Science, or equivalent practical experienceExperience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.

    Experience in implementing service dashboards for monitoring.

    objectives, and metricsExperience developing and/or administering software in AWS cloud infrastructureSystem administration skills, including automation and orchestration of environments using Terraform or CloudFormation and configuration managementDemonstrable cross-functional knowledge with systems, storage, networking, security and databasesExperience with container orchestration tools and container management (Docker, Kubernetes, etc.)

    Proficiency with continuous integration and continuous delivery tooling and practicesStrong analytical and troubleshooting skills; Experience with runbooksPreferred Qualifications:
    Expertise designing, analyzing and troubleshooting large-scale distributed systems.
    Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and driveExperience in implementing Infrastructure as codeExperience building software and maintaining systems in a highly secure, regulated or compliant industryExperience and passion for working within a DevSecOps team culture

    Additional Details:
    Must be authorized to work in the US without company sponsorship.
    This role can have a Hybrid or Remote work arrangement.

    Candidates who live near one of our office locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday).

    Candidates who do not live near an office will have a remote work arrangement, with the expectation of coming into an office as business needs arise.

    CompensationThe listed annualized base pay range is primarily based on analysis of similar positions in the external market.


    Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role.

    The base pay is just one component of The Hartford's total compensation package for employees.

    Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition.

    The annualized base pay range for this role is:$126,160
    - $189,240Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/AgeAbout Us | Culture & Employee Insights | Diversity, Equity and Inclusion | BenefitsSummaryLocation: Hartford, CT; Chicago, ILType: Full time
    Remote working/work at home options are available for this role.