- Design, develop, and maintain AI/ML models for anomaly detection, trend analysis, and signal correlation across metrics, logs, traces, and events.
- Reduce alert noise through intelligent alert grouping, suppression, and prioritization.
- Enhance observability platforms with AI-generated insights supporting SLO and error-budget management.
- Implement AI-driven incident classification, enrichment, and summarization.
- Provide probable root-cause analysis recommendations based on historical and real-time telemetry.
- Support on-call and incident response teams with AI-guided remediation suggestions.
- Contribute AI insights to post-incident reviews and reliability improvement plans.
- Apply AI techniques to identify repetitive operational tasks and automation opportunities.
- Assist in generating, validating, and optimizing automation playbooks and workflows.
- Analyze automation execution data to improve success rates, resiliency, and reuse.
- Build and maintain AI-searchable knowledge repositories containing runbooks, SOPs, lessons learned, and historical incident data.
- Enable natural-language access to operational knowledge for SREs and operations staff.
- Reduce dependency on tribal knowledge through intelligent documentation and discovery.
- Develop predictive models for capacity planning, failure forecasting, configuration risk, and reliability debt identification.
- Support proactive remediation strategies to prevent incidents before customer impact.
- Assist SRE leadership in data-driven prioritization of reliability investments.
- Ensure AI solutions adhere to organizational security, compliance, and data-handling policies.
- Establish guardrails for AI recommendations, human-in-the-loop decision making, and automation execution.
- Promote transparency, explainability, and auditability of AI-driven operational decisions.
- Bachelor's degree in computer science, Engineering, Information Systems, Data Science, or related discipline
- 5+ years in Site Reliability Engineering, DevOps, IT Operations, or Systems Engineering
- 2+ years applying AI/ML techniques in operational, analytics, or automation contexts
- Demonstrated experience supporting production systems in high-availability environments
- Must have an active Secret Clearance in order to be considered for the position
- Proficiency in data analysis tooling
- Experience with machine learning fundamentals (anomaly detection, clustering, time-series analysis, NLP)
- Familiarity with observability platforms (metrics, logs, traces, events)
- Experience with automation frameworks and infrastructure-as-code concepts
- Strong understanding of distributed systems and operational telemetry
-
The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program. · ...
Bremerton, WA $115,000 - $185,000 (USD) per year1 week ago
-
We are looking for someone who can wear multiple hats depending on the task at hand has a can do attitude with a demonstrated background in reliability engineering high attention to detail thirst for knowledge and an inherent interest in all aspects of engineering. · We are seeki ...
Redmond $118,000 - $170,000 (USD) Full time1 month ago
-
As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. · We are looking for someone who can wear multiple hats depending on the task at hand, has a \ ...
Redmond1 month ago
-
+ · FUJIFILM Sonosite is seeking a Reliability Engineer to support R&D Systems Engineering developing Point-of-Care ultrasound medical devices in Bothell, Washington.Fujifilm Sonosite reinvents how healthcare is delivered with point-of-care ultrasound technology. · + · Work hand ...
Bothell $104,000 - $115,000 (USD) Full time1 month ago
-
· Position Overview · Fujifilm Sonosite is seeking a Reliability Engineer to support R&D Systems Engineering developing Point-of-Care ultrasound medical devices in Bothell, Washington. As an individual contributor, you will support the entire lifecycle of ultrasound systems, in ...
Bothell, Washington, United States $85,000 - $145,000 (USD) per year1 week ago
-
Redmond - Washington - United States of America · About The Role · As a CBRE Reliability Engineer, you will monitor, analyze, and resolve system and infrastructure problems of mission-critical operational systems. · This job is part of the Engineering and Technical Services job f ...
Redmond $105,000 - $120,000 (USD)6 days ago
-
+Job summary · Cintas is seeking a Reliability Engineer to assist Group Vice President(s), location management teams and Corporate Quality and Engineering in overseeing long range capacity planning for assigned locations.++communicates with executives, management and operations t ...
Bellevue $109,800 - $142,740 (USD)1 month ago
-
· Job Duties · Microsoft Corporation has the following job opportunity: · Design, develop, modify, or evaluate electronic parts, components, integrated circuitry, or systems related to Microsoft technologies. Collaborate with cross functional teams to determine the selection, ap ...
Redmond $85,000 - $145,000 (USD) per year5 days ago
-
We are looking for someone who can wear multiple hats depending on the task at hand, has a \ ...
Redmond $144,000 - $204,000 (USD)1 month ago
-
We are looking for a Reliability Engineer to join our team at Meta Reality Labs. · As a Reliability Engineer in this role you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. · The ideal candidate has a 'can do' attitud ...
Redmond $144,000 - $204,000 (USD) Full time1 month ago
-
Cintas is seeking a Reliability Engineer to assist Group Vice President(s), location management teams and Corporate Quality and Engineering in overseeing long range capacity planning for assigned locations. · ...
Bellevue, WA1 month ago
- Work in company
MHE Reliability Engineer, MHE Reliability Engineering Team
Only for registered members
Lensa does not hire directly for these jobs but promotes them on LinkedIn behalf of clients recruitment agencies marketing partners Description As a Material Handling Equipment MHE MHE Reliability Engineer you will be primary point of contact for internal customers and vendors pr ...
Bellevue1 month ago
-
SpaceX is actively developing technologies to make human life on Mars possible. · ...
Woodinville1 month ago
-
hardware reliability engineer on the starlink team responsible for solving systemic production issues running root cause of failures from end to end and working to drive process and design changes to continually improve production and on-orbit product quality and reliability engi ...
Redmond1 month ago
-
We're searching for an expert Network Reliability Engineer to architect, optimize, and operate the high-performance network fabrics that power large-scale AI and HPC workloads. · This isn't a traditional networking role — it's an opportunity to shape the performance backbone of s ...
Seattle1 month ago
-
This role is the team facing, consultative side of observability. The senior engineer partners directly with internal engineering teams to understand their systems, pain points, and reliability gaps. · Build dashboards in Grafana for internal teams and leadership. · Maintain obse ...
Seattle1 month ago
-
Microsoft has been a leading company in computing for decades. We are a global service relied on by governments and organizations to deliver the things they need to work every day. · Develops technical expertise in the code, features and operations of specific products as require ...
Redmond $100,600 - $199,000 (USD) Full time1 month ago
-
The IDEAS organization's mission is to unlock the power of data to deliver actionable insights and personalized experiences at scale thereby driving usage engagement and revenue across Microsoft 365 Azure Windows and more. As part of the team you'll collaborate with teams company ...
Redmond $84,200 - $165,200 (USD) Full time1 month ago
-
We are a global service, relied on by governments, utilities, schools and co-operatives to deliver the things they need to work every day and to make this work for our customers we need continual effort to make that delivery reliable. · This is the core of what our Site Reliabili ...
Redmond1 month ago
-
Overview · Microsoft has an exciting opportunity for a Senior Site Reliability Engineer (SRE) to join the Azure Silver and Sovereign Team as part of the Azure Data Transfer (ADT) team. Azure Data Transfer enables secure access and data transfer between enclaves and supports multi ...
Redmond $119,800 - $234,700 (USD) Full time1 week ago
-
SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. ...
Redmond $160,000 - $260,000 (USD) per year3 days ago
Site Reliability Engineer - Bremerton - Via Logic LLC
Description
Description
The U.S. Navy's Service Management, Integration, and Transport (SMIT) program has an opening for a Site Reliability Automation and Orchestration Engineer on a high-visibility DoD program that provides engineering support to the Navy Marine Corps Intranet (NMCI), the largest information technology (IT) network in the world. This position will provide many opportunities to challenge and grow your skills.
The AI Reliability Engineer (AI-SRE) is responsible for integrating artificial intelligence and machine learning capabilities into Site Reliability Engineering (SRE) operations to improve system reliability, availability, performance, and operational efficiency. This role serves as a horizontal enabler across SRE pods, leveraging AI-driven insights to reduce operational toil, accelerating incident response, enhance observability, and enable predictive reliability engineering. The AI-SRE partners closely with infrastructure, network, application, cyber, and platform SRE teams to transform operational data into actionable intelligence while ensuring AI solutions are safe, explainable, auditable, and aligned with SRE principles.
Key Responsibilities
AIOps & Observability Intelligence
AI-Assisted Incident Management
Automation & Ops-as-Code Enablement
Knowledge Management & Runbook Intelligence
Predictive Reliability Engineering
Governance, Security & Trust
Required Qualifications
Technical Skills
Remote
Commitment to Non-Discrimination
All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.
#J-18808-Ljbffr
-
Site Reliability Engineer
Only for registered members Bremerton, WA
-
Reliability Engineer
Full time Only for registered members Redmond
-
Reliability Engineer
Only for registered members Redmond
-
Reliability Engineer
Full time Only for registered members Bothell
-
Reliability Engineer
Only for registered members Bothell, Washington, United States
-
Reliability Engineer
Only for registered members Redmond
-
Reliability Engineer
Only for registered members Bellevue
-
Reliability Engineer
Only for registered members Redmond
-
Reliability Engineer
Only for registered members Redmond
-
Reliability Engineer
Full time Only for registered members Redmond
-
Reliability Engineer
Only for registered members Bellevue, WA
-
MHE Reliability Engineer, MHE Reliability Engineering Team
Only for registered members Bellevue
-
Hardware Reliability Engineer
Only for registered members Woodinville
-
Hardware Reliability Engineer
Only for registered members Redmond
-
Network Reliability Engineer
Only for registered members Seattle
-
Site Reliability Engineer
Only for registered members Seattle
-
Site Reliability Engineer
Full time Only for registered members Redmond
-
Site Reliability Engineer
Full time Only for registered members Redmond
-
Site Reliability Engineer
Only for registered members Redmond
-
Site Reliability Engineer
Full time Only for registered members Redmond
-
Lead Reliability Engineer
Only for registered members Redmond