Site Reliability Engineer - San Diego - Via Logic LLC

    Via Logic LLC
    Via Logic LLC San Diego

    4 days ago

    $115,000 - $185,000 (USD) per year *
    Description

    Description


    The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest information technology (IT) network in the world. This position will provide many opportunities to challenge and grow your skills.

    The AI Reliability Engineer (AI-SRE) is responsible for integrating artificial intelligence and machine learning capabilities into Site Reliability Engineering (SRE) operations to improve system reliability, availability, performance, and operational efficiency. This role serves as a horizontal enabler across SRE pods, leveraging AI-driven insights to reduce operational toil, accelerating incident response, enhance observability, and enable predictive reliability engineering. The AI-SRE partners closely with infrastructure, network, application, cyber, and platform SRE teams to transform operational data into actionable intelligence while ensuring AI solutions are safe, explainable, auditable, and aligned with SRE principles.

    Key Responsibilities


    AIOps & Observability Intelligence

    • Design, develop, and maintain AI/ML models for anomaly detection, trend analysis, and signal correlation across metrics, logs, traces, and events.
    • Reduce alert noise through intelligent alert grouping, suppression, and prioritization.
    • Enhance observability platforms with AI-generated insights supporting SLO and error-budget management.

    AI-Assisted Incident Management

    • Implement AI-driven incident classification, enrichment, and summarization.
    • Provide probable root-cause analysis recommendations based on historical and real-time telemetry.
    • Support on-call and incident response teams with AI-guided remediation suggestions.
    • Contribute AI insights to post-incident reviews and reliability improvement plans.

    Automation & Ops-as-Code Enablement

    • Apply AI techniques to identify repetitive operational tasks and automation opportunities.
    • Assist in generating, validating, and optimizing automation playbooks and workflows.
    • Analyze automation execution data to improve success rates, resiliency, and reuse.

    Knowledge Management & Runbook Intelligence

    • Build and maintain AI-searchable knowledge repositories containing runbooks, SOPs, lessons learned, and historical incident data.
    • Enable natural-language access to operational knowledge for SREs and operations staff.
    • Reduce dependency on tribal knowledge through intelligent documentation and discovery.

    Predictive Reliability Engineering

    • Develop predictive models for capacity planning, failure forecasting, configuration risk, and reliability debt identification.
    • Support proactive remediation strategies to prevent incidents before customer impact.
    • Assist SRE leadership in data-driven prioritization of reliability investments.

    Governance, Security & Trust

    • Ensure AI solutions adhere to organizational security, compliance, and data-handling policies.
    • Establish guardrails for AI recommendations, human-in-the-loop decision making, and automation execution.
    • Promote transparency, explainability, and auditability of AI-driven operational decisions.

    Required Qualifications

    • Bachelor's degree in computer science, Engineering, Information Systems, Data Science, or related discipline
    • 5+ years in Site Reliability Engineering, DevOps, IT Operations, or Systems Engineering
    • 2+ years applying AI/ML techniques in operational, analytics, or automation contexts
    • Demonstrated experience supporting production systems in high-availability environments
    • Must have an active Secret Clearance in order to be considered for the position

    Technical Skills

    • Proficiency in data analysis tooling
    • Experience with machine learning fundamentals (anomaly detection, clustering, time-series analysis, NLP)
    • Familiarity with observability platforms (metrics, logs, traces, events)
    • Experience with automation frameworks and infrastructure-as-code concepts
    • Strong understanding of distributed systems and operational telemetry

    Remote

    Commitment to Non-Discrimination


    All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.


    #J-18808-Ljbffr
    * This salary range is an estimation made by beBee
  • Work in company

    Reliability Engineer

    Only for registered members

    This role offers an exciting opportunity to drive operational excellence in facility management as Reliability Engineer providing comprehensive engineering support for the planning, construction, operation and maintenance of buildings and utilities. · ...

    San Diego, CA

    2 months ago

  • Work in company

    Staff Reliability Engineer

    Only for registered members

    The Staff Reliability Engineer will lead reliability strategy for data center products. · ...

    San Diego, CA

    2 weeks ago

  • Work in company

    Module Reliability Engineer

    Only for registered members

    Are you a technologist looking for a career choice that provides ever evolving challenges?This position is offered in Apple's Hardware Module Reliability Group. · We guide development teams toward generating reliable designs for Apple's new technology components.We achieve this o ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Module Reliability Engineer

    Only for registered members

    We guide development teams toward generating reliable designs for Apple's new technology components. · This position is offered in Apple's Hardware Module Reliability Group and we achieve this objective by identifying high risk failure modes early in the design life cycle and by ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Product Reliability Engineer

    Only for registered members

    The primary responsibility of the PRE or Product Reliability Engineer is the measurement, evaluation, estimation, and reporting of the field reliability performance of printer products through the product life cycle. · Responsibilities:The primary responsibility of the PRE or Pro ...

    San Diego

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest infor ...

    San Diego $131,300 - $237,350 (USD) Full time

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Video Computer Vision organization is working on exciting technologies for future Apple products. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad sensor. · We are looking for the right Site Reliability Engineer to help ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    Company · Qualcomm Technologies, Inc. · Job Area · Engineering Group, Engineering Group > Reliability Development Engineering · General Summary · The Reliability Engineer will evaluate product design, package, materials, and process from a reliability standpoint. This role involv ...

    San Diego, CA $104,000 - $156,000 (USD) per year

    4 days ago

  • Work in company

    Hardware Reliability Engineer

    Only for registered members

    We guide development teams toward generating reliable designs for Apple's new technology Displays. · As a Module Reliability Engineer, you will set test specs and drive procurement / qualification of equipment that is critical to the validation of Design Reliability. · The base p ...

    San Diego $139,500 - $258,100 (USD)

    1 week ago

  • Work in company

    Module Reliability Engineer

    Only for registered members

    We guide development teams toward generating reliable designs for Apple's new technology components. · This position offers an exciting chance to build your skills as you learn about a wide variety of technologies. · You will assess Reliability by executing stress testing and ana ...

    San Diego Full time

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Video Computer Vision organization is working on exciting technologies for future Apple products. Our focus is on ML based solution around real time image and video. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad senso ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    At Apple our Data Analytics team focuses on improving user experience by operating system stability gathering feature usage telemetry evaluating device performance This requires capturing data from customers who have given consent utilizing strong privacy preserving techniques ag ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Module Reliability Engineer

    Only for registered members

    +Job summary · Are you a technologist looking for a career choice that provides ever evolving challenges?+Drive & update FMEA · Establish specifications for component/module reliability · Set up reliability test infrastructure at Apple suppliers · +<ul class= ...

    San Diego $139,500 - $258,100 (USD)

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest infor ...

    San Diego $131,300 - $237,350 (USD)

    1 week ago

  • Work in company

    Hardware Reliability Engineer

    Only for registered members

    We are looking for a Hardware Reliability Engineer to plan and execute hardware reliability tasks for Oura's wearable products. · ...

    San Diego $151,300 - $178,000 (USD) Full time

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    +Join Apple's Mission-Critical Analytics Team as a Site Reliability EngineerWe're seeking a Site Reliability Engineer to join our data analytics team, where your work will be essential to keeping Apple's most important systems running smoothly. · +Comprehensive medical and dental ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Job summaryThe Video Computer Vision organization is working on exciting technologies for future Apple products. Our focus is on ML based solution around real time image and video. We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR i ...

    San Diego, CA

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Job summary · The Video Computer Vision organization is working on exciting technologies for future Apple products. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad sensor. · ...

    San Diego $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Module Reliability Engineer

    Only for registered members

    This position is offered in Apple's Hardware Module Reliability Group. We guide development teams toward generating reliable designs for Apple's new technology components. · We achieve this objective by identifying high risk failure modes early in the design life cycle and by usi ...

    San Diego, CA

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Job Summary · General Atomics (GA), and its affiliated companies, is one of the world's leading resources for high-technology systems development ranging from the nuclear fuel cycle to remotely piloted aircraft, airborne sensors, and advanced electric, electronic, wireless and la ...

    San Diego $81,080 - $141,650 (USD)

    6 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    +Job summary · Join Apple's Mission-Critical Analytics Team as a Site Reliability Engineer. · +QualificationsBachelors degree or equivalent experience. · Familiarity with one or more cloud services environments (AWS, GCP, etc.). · Experience working with big data technologies and ...

    San Diego, CA

    1 month ago

Jobs
>
Site reliability engineer
>
Jobs for Site reliability engineer in San Diego