- Design, develop, and maintain AI/ML models for anomaly detection, trend analysis, and signal correlation across metrics, logs, traces, and events.
- Reduce alert noise through intelligent alert grouping, suppression, and prioritization.
- Enhance observability platforms with AI-generated insights supporting SLO and error-budget management.
- Implement AI-driven incident classification, enrichment, and summarization.
- Provide probable root-cause analysis recommendations based on historical and real-time telemetry.
- Support on-call and incident response teams with AI-guided remediation suggestions.
- Contribute AI insights to post-incident reviews and reliability improvement plans.
- Apply AI techniques to identify repetitive operational tasks and automation opportunities.
- Assist in generating, validating, and optimizing automation playbooks and workflows.
- Analyze automation execution data to improve success rates, resiliency, and reuse.
- Build and maintain AI-searchable knowledge repositories containing runbooks, SOPs, lessons learned, and historical incident data.
- Enable natural-language access to operational knowledge for SREs and operations staff.
- Reduce dependency on tribal knowledge through intelligent documentation and discovery.
- Develop predictive models for capacity planning, failure forecasting, configuration risk, and reliability debt identification.
- Support proactive remediation strategies to prevent incidents before customer impact.
- Assist SRE leadership in data-driven prioritization of reliability investments.
- Ensure AI solutions adhere to organizational security, compliance, and data-handling policies.
- Establish guardrails for AI recommendations, human-in-the-loop decision making, and automation execution.
- Promote transparency, explainability, and auditability of AI-driven operational decisions.
- Bachelor's degree in computer science, Engineering, Information Systems, Data Science, or related discipline
- 5+ years in Site Reliability Engineering, DevOps, IT Operations, or Systems Engineering
- 2+ years applying AI/ML techniques in operational, analytics, or automation contexts
- Demonstrated experience supporting production systems in high-availability environments
- Must have an active Secret Clearance in order to be considered for the position
- Proficiency in data analysis tooling
- Experience with machine learning fundamentals (anomaly detection, clustering, time-series analysis, NLP)
- Familiarity with observability platforms (metrics, logs, traces, events)
- Experience with automation frameworks and infrastructure-as-code concepts
- Strong understanding of distributed systems and operational telemetry
-
This role offers an exciting opportunity to drive operational excellence in facility management as Reliability Engineer providing comprehensive engineering support for the planning, construction, operation and maintenance of buildings and utilities. · ...
San Diego, CA2 months ago
-
The Staff Reliability Engineer will lead reliability strategy for data center products. · ...
San Diego, CA2 weeks ago
-
Are you a technologist looking for a career choice that provides ever evolving challenges?This position is offered in Apple's Hardware Module Reliability Group. · We guide development teams toward generating reliable designs for Apple's new technology components.We achieve this o ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
We guide development teams toward generating reliable designs for Apple's new technology components. · This position is offered in Apple's Hardware Module Reliability Group and we achieve this objective by identifying high risk failure modes early in the design life cycle and by ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
The primary responsibility of the PRE or Product Reliability Engineer is the measurement, evaluation, estimation, and reporting of the field reliability performance of printer products through the product life cycle. · Responsibilities:The primary responsibility of the PRE or Pro ...
San Diego1 month ago
-
The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest infor ...
San Diego $131,300 - $237,350 (USD) Full time2 weeks ago
-
The Video Computer Vision organization is working on exciting technologies for future Apple products. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad sensor. · We are looking for the right Site Reliability Engineer to help ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
Company · Qualcomm Technologies, Inc. · Job Area · Engineering Group, Engineering Group > Reliability Development Engineering · General Summary · The Reliability Engineer will evaluate product design, package, materials, and process from a reliability standpoint. This role involv ...
San Diego, CA $104,000 - $156,000 (USD) per year4 days ago
-
We guide development teams toward generating reliable designs for Apple's new technology Displays. · As a Module Reliability Engineer, you will set test specs and drive procurement / qualification of equipment that is critical to the validation of Design Reliability. · The base p ...
San Diego $139,500 - $258,100 (USD)1 week ago
-
We guide development teams toward generating reliable designs for Apple's new technology components. · This position offers an exciting chance to build your skills as you learn about a wide variety of technologies. · You will assess Reliability by executing stress testing and ana ...
San Diego Full time1 month ago
-
The Video Computer Vision organization is working on exciting technologies for future Apple products. Our focus is on ML based solution around real time image and video. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad senso ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
At Apple our Data Analytics team focuses on improving user experience by operating system stability gathering feature usage telemetry evaluating device performance This requires capturing data from customers who have given consent utilizing strong privacy preserving techniques ag ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
+Job summary · Are you a technologist looking for a career choice that provides ever evolving challenges?+Drive & update FMEA · Establish specifications for component/module reliability · Set up reliability test infrastructure at Apple suppliers · +<ul class= ...
San Diego $139,500 - $258,100 (USD)3 weeks ago
-
The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest infor ...
San Diego $131,300 - $237,350 (USD)1 week ago
-
We are looking for a Hardware Reliability Engineer to plan and execute hardware reliability tasks for Oura's wearable products. · ...
San Diego $151,300 - $178,000 (USD) Full time1 month ago
-
+Join Apple's Mission-Critical Analytics Team as a Site Reliability EngineerWe're seeking a Site Reliability Engineer to join our data analytics team, where your work will be essential to keeping Apple's most important systems running smoothly. · +Comprehensive medical and dental ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
Job summaryThe Video Computer Vision organization is working on exciting technologies for future Apple products. Our focus is on ML based solution around real time image and video. We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR i ...
San Diego, CA1 month ago
-
Job summary · The Video Computer Vision organization is working on exciting technologies for future Apple products. · We have contributed to the FaceID and FaceKit project in the past and more recently the new LIDAR iPad sensor. · ...
San Diego $139,500 - $258,100 (USD)1 month ago
-
This position is offered in Apple's Hardware Module Reliability Group. We guide development teams toward generating reliable designs for Apple's new technology components. · We achieve this objective by identifying high risk failure modes early in the design life cycle and by usi ...
San Diego, CA1 month ago
-
Job Summary · General Atomics (GA), and its affiliated companies, is one of the world's leading resources for high-technology systems development ranging from the nuclear fuel cycle to remotely piloted aircraft, airborne sensors, and advanced electric, electronic, wireless and la ...
San Diego $81,080 - $141,650 (USD)6 days ago
-
+Job summary · Join Apple's Mission-Critical Analytics Team as a Site Reliability Engineer. · +QualificationsBachelors degree or equivalent experience. · Familiarity with one or more cloud services environments (AWS, GCP, etc.). · Experience working with big data technologies and ...
San Diego, CA1 month ago
Site Reliability Engineer - San Diego - Via Logic LLC
Description
Description
The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest information technology (IT) network in the world. This position will provide many opportunities to challenge and grow your skills.
The AI Reliability Engineer (AI-SRE) is responsible for integrating artificial intelligence and machine learning capabilities into Site Reliability Engineering (SRE) operations to improve system reliability, availability, performance, and operational efficiency. This role serves as a horizontal enabler across SRE pods, leveraging AI-driven insights to reduce operational toil, accelerating incident response, enhance observability, and enable predictive reliability engineering. The AI-SRE partners closely with infrastructure, network, application, cyber, and platform SRE teams to transform operational data into actionable intelligence while ensuring AI solutions are safe, explainable, auditable, and aligned with SRE principles.
Key Responsibilities
AIOps & Observability Intelligence
AI-Assisted Incident Management
Automation & Ops-as-Code Enablement
Knowledge Management & Runbook Intelligence
Predictive Reliability Engineering
Governance, Security & Trust
Required Qualifications
Technical Skills
Remote
Commitment to Non-Discrimination
All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.
#J-18808-Ljbffr
-
Reliability Engineer
Only for registered members San Diego, CA
-
Staff Reliability Engineer
Only for registered members San Diego, CA
-
Module Reliability Engineer
Only for registered members San Diego
-
Module Reliability Engineer
Only for registered members San Diego
-
Product Reliability Engineer
Only for registered members San Diego
-
Site Reliability Engineer
Full time Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego
-
Senior Reliability Engineer
Only for registered members San Diego, CA
-
Hardware Reliability Engineer
Only for registered members San Diego
-
Module Reliability Engineer
Full time Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego
-
Module Reliability Engineer
Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego
-
Hardware Reliability Engineer
Full time Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego, CA
-
Site Reliability Engineer
Only for registered members San Diego
-
Module Reliability Engineer
Only for registered members San Diego, CA
-
Site Reliability Engineer
Only for registered members San Diego
-
Site Reliability Engineer
Only for registered members San Diego, CA