AWS Infrastructure consultant with Incident Management - Princeton, United States - Diversity Nexus

    Diversity Nexus
    Diversity Nexus Princeton, United States

    3 weeks ago

    Default job background
    Description
    AWS Infrastructure consultant with Incident Management

    Duration:

    12 MonthLocation:
    RemoteWe can connect over the phone for further details.

    KEY JOB FUNCTIONSPerform analysis of data, evaluating multiple application protocols including web, database, storage, and supporting infrastructure such as AWS, UNIX, DNS, LDAP, SSL, SMTP, and FTP.Troubleshooting and resolving incidents on the AWS cloud infrastructure.

    Hands on experience managing and monitoring applications deployed on Amazon Web Services (AWS) using tools like EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambada, S3, Batch, CloudWatch, CloudTrail, WAF etc.

    Experience with building tools for monitoring and troubleshooting of system resources in an AWS environment. Ability to triage AWS related incidents using monitoring tools on AWS Cloud.
    Experience with performance engineering of AWS Cloud applications.
    Hands on experience with transaction level monitoring using Dynatrace, OpenTelemetry and Splunk.
    Ability to perform transaction level monitoring and troubleshooting in AWS cloud platform.
    Eyes on glass monitoring of the health of applications as well as the underlying infrastructure.
    Monitoring experience with tools like Extrahop, SolarWinds, Netcool suite, Catchpoint, MoogSoft.
    Ability to analyze dashboards and reporting/monitoring tools to look at trends and patterns in application health and performance.
    Proactively looking for hardware, software, and environmental alerts or malfunctions.
    Influence other technical teams on the calls and articulate troubleshooting steps effectively.
    Lead required technical follow-up calls for critical incidents.

    Assist with documentation of Root Cause Analysis (RCA) or Correction of Errors (COE) and data quality for all ECC communicated incidents.

    Ensure appropriate functional and management escalation takes place as per the standards and procedures.

    Follow up on items that could potentially negatively impact production operations, assist with postmortem related activities, and support various efforts related to operational improvements.

    Based on recommendations from management, implement new and improved processes, change processes, perform new tasks, create reports and address ad-hoc requests.

    Participate in on-call rotation. Ability to work on any shifts as needed including weekends and night shifts.
    Ability to report incident details and metrics to senior leadership.#J-18808-Ljbffr