Software Development Engineer, EC2 Health Analytics Team - Seattle, United States - Amazon Development Center U.S., Inc.

    Default job background
    Full time
    Description

    EC2 Health Analytics Team is responsible for Classification, Measurement and Analysis of failure events across the EC2 fleet to improve AWS fleet reliability and improve customer experience.


    As part of the EC2 HA team, you will work on highly scalable tools and software services to measure fleet health, identify failure patterns and generate automated health reports.

    You will work with partner teams to improve existing failure classifications and create new failure classifications. You will use data science techniques to identity spikes in failures across the fleet.

    You will work to ensure that the failures patterns are root caused and fixed to ensure a healthy AWS fleet.

    You will drive innovation and development of new tools and services to cover new operational and health metrics.

    Key job responsibilities
    Designing and developing cutting edge highly reliable and scalable distributed systems.
    Delivering quality features on-time and on-budget and execution against project plans and delivery commitments.
    Working with team members to manage the day-to-day development activities, participate in designs, design review, code review, and implementation.
    Engaging and working with customers and dependencies to ensure a quality delivery.
    Mentoring other engineers

    Maintaining current technical knowledge to support rapidly changing technology, always on a look out for new technologies and work with the team in bringing in new technologies.

    A day in the life
    You will use data analytics and various large data sets to efficiently detect and root cause EC2 server and instance failures
    You will exercise the highest bar for security in both code and operations.
    Our customers rely on timely availability of the quality and reliability data. You will generate and provide reliability reports to our customers in a timely manner and incorporate customer feedback in improving our reports and dashboards

    About the team

    We are looking for top engineers to join a talented, innovative team to help us monitor and drive improvements of one of the largest server fleets in the world.

    The team is focused on intelligent monitoring, forecasting and machine learning models to improve EC2 reliability, availability and flexibility

    We are open to hiring candidates to work out of one of the following locations:

    Seattle, WA, USA

    BASIC QUALIFICATIONS

    • 3+ years of noninternship professional software development experience
    • 2+ years of noninternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
    • Experience programming with at least one software programming language
    • Computer Science fundamentals in data structures, objectoriented design, algorithm design, problem solving, and complexity analysis
    PREFERRED QUALIFICATIONS

    • Experience with PowerShell (preferred), Python, Ruby, or Java
    • Experience working in an Agile environment using the Scrum methodology
    • Master in Computer Science or related field preferred
    • Experience working with Linux operating systems
    • Experience with architecting high scale systems
    • Experience with developing cloud technologies
    • Candidate must be able to work with a minimum of technical supervision and supplemental engineering support, while responding efficiently to multiple program priorities.