Site Reliability Engineer - McLean, United States - Inadev Corporation

    Inadev Corporation
    Inadev Corporation McLean, United States

    2 weeks ago

    Default job background
    Description

    Formed in 2011, Inadev is focused on its founding principle to build innovative customer-centric solutions incredibly fast, secure, and at scale.

    We deliver world-class digital experiences to some of the largest federal agencies and commercial companies.

    Our technical expertise and innovations are comprised of codeless automation, identity intelligence, immersive technology, artificial intelligence/machine learning (AI/ML), virtualization, and digital transformation.


    POSITION DESCRIPTION:

    INADEV is seeking a strong, cross-functional Site Reliability Engineer, who will ensure that cloud-based systems are always up and running smoothly.

    You will work closely with our development teams to design and implement scalable solutions that can handle high traffic loads, and you will monitor the infrastructure to identify and resolve any issues quickly.

    This role requires strong problem-solving skills and a passion for improving system performance.

    If you are a proactive, detail-oriented engineer with a desire to build and maintain cutting-edge technology infrastructure, we want to hear from you.

    The ideal candidate is extremely passionate about the space, has a technical background, and experience within the federal government, in either a full time capacity or at a government contractor.

    The candidate will work with multiple DevSecOps teams to deliver the required services and ensuring that development standards are adopted to ensure self-healing products and code created adheres to standards to increase and maintain operations.


    RESPONSIBILITIES:

    • Design and implement reliable and scalable systems to ensure high availability of web applications.
    • Collaborate with development teams to define and implement monitoring, alerting, and self-healing capabilities.
    • Ensure that all systems and applications are up-to-date with the latest security patches and configurations.
    • Identify, troubleshoot, and resolve issues related to application performance, availability, and scalability.
    • Develop and maintain documentation related to system architecture, configuration, and processes.
    • Conduct regular performance and capacity planning exercises to ensure that systems can handle anticipated loads.
    • Maintain and improve the reliability and resilience of the infrastructure by implementing best practices, automation, and testing.
    • Participate in on-call rotations to provide rapid response to incidents and ensure continuity of service.
    • Collaborate with other teams to ensure that incident response and communication processes are effective and efficient.
    • Ability to apply software engineering principles to systems administration tasks

    NON-TECHNICAL REQUIREMENTS:

    • Must be a US Citizen
    • Ability to pass a 7-year background check and be able to obtain a US Public Trust
    • Must be open to working a Hybrid Schedule (2-3 days a week) in Mclean, VA
    • Strong interpersonal, communication, and presentation skills
    • Must be able to comfortably present information to C-Level Executives and Client/Customer Leadership
    • Must have critical thinking and analytical skills.
    • Excellent time management skills and working within aggressive delivery schedules

    MANDATORY REQUIREMENTS/SKILLS:

    • Bachelor's degree in Computer science, Information Technology Management or Engineering (or other STEM related field) and at least 10 years of combined Software Engineering and DevOps experience. In lieu of a degree, 2 additional years of experience can be substituted.
    • Strong experience in managing containerized workloads in Kubernetes
    • Setting up observability in systems and instrumenting them with New Relic, Splunk, etc.
    • Knowledge of defining and monitoring system quality measures, including SLO and SLA.
    • Built tooling to improve reliability of systems, automated remediation of issues, or improve scalability
    • Maintain upwards of 5-9s availability of containerized workloads on AWS.
    • Experience in designing and implementing end to end continuous delivery pipelines
    • Deep AWS cloud experience in a production environment (e.g., network, security, deployment, automation, serverless technologies)
    • Experience and understanding in SRE principles for highly scalable and reliable systems
    • Extensive experience with Configuration Management and Infrastructure as Code

    PHYSICAL DEMANDS:

    • Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions INADEV Corporation does not discriminate against qualified individuals based on their status as protected veterans or individuals with disabilities and prohibits discrimination against all individuals based on their race, color, religion, sex, sexual orientation/gender identity, or national origin.