- Deep understanding of Linux systems and performance (I/O schedulers, RAID, caching, NUMA, kernel parameters).
- Hands-on experience designing and managing on-premise servers, storage arrays, or HPC clusters.
- Comfort with automation and software development (Python, Go, Bash, or similar).
- Strong diagnostic and analytical skills: ability to decompose performance problems across multiple layers.
- Proven track record of improving system reliability, throughput, and maintainability in a fast-paced environment.
- Excellent written and verbal communication skills for cross-disciplinary collaboration.
- Self-driven, curious, and motivated by understanding systems deeply rather than just maintaining them.
- 5-10 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.
- Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm).
- Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).
- Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE/40 GbE).
- Experience supporting data-heavy scientific or ML workloads.
- Demonstrated technical leadership - mentoring others in debugging, reliability, or performance analysis.
-
Must have: · Failure Analysis & Corrective Action Management (FRACAS) · Stakeholder Communication & Cross-Functional Coordination · Rail /AERO Rolling Stock Reliability Expertise · Job Description: · Be a part of Reliability Growth Team dedicated to supporting the NGHST program ...
Philadelphia $85,000 - $145,000 (USD) per year Contract19 hours ago
-
We're hiring a Reliability Engineer to lead and mature the reliability strategy at one of East Tennessee's fastest-growing plastics manufacturers. · Equipment Reliability Strategy – Guide reliability and maintainability initiatives across new and existing equipment. · Maintenance ...
Morristown $85,000 - $145,000 (USD) per year1 week ago
-
We are a growing data-driven organization is seeking a Staff Site-Reliability Engineer to join its engineering team. This role partners closely with application engineering, data, and analytics teams to design, manage, and scale cloud infrastructure across a multi-product environ ...
Los Angeles $115,000 - $185,000 (USD) per year1 week ago
-
We're building a software platform that empowers today's commercial contractors. From service management to project execution, we're reimagining how our customers operate. · ...
Los Angeles $120,000 - $150,000 (USD) Full time2 weeks ago
-
Zachary Piper Solutions is seeking an experienced Site Reliability Engineer (SRE) to support the deployment and sustainment of systems across classified, air-gapped, and government cloud environments. This role blends operations, security, and reliability engineering, and is well ...
Los Angeles $140,000 - $180,000 (USD)2 days ago
-
+We are looking for a Senior Reliability Engineer to develop and manage the plant equipment maintenance strategy. The ideal candidate will have 10+ years of engineering experience and 5+ years of maintenance or reliability engineering experience. · +Monitor equipment data using T ...
Los Angeles $95,500 - $126,700 (USD) Full time2 weeks ago
-
We anticipate the application window for this opening will close on -16 Jan 2026 At Medtronic you can begin a life-long career of exploration and innovation while helping champion healthcare access and equity for allYou'll lead with purpose breaking down barriers to innovation in ...
Los Angeles $106,400 - $159,600 (USD)1 month ago
-
+Job summary · Support the deployment and sustainment of systems across classified environments · + · Deploy and maintain software in air-gapped and customer-owned cloud or on-prem environments · ,liauthenticate infrastructure configurations in AWS C2E and other classified enviro ...
Los Angeles, CA2 weeks ago
-
An exciting career awaits you · At MPC, we're committed to being a great place to work – one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment. · Instrument Reliability Engineer · Job Summary · The Marathon ...
Los Angeles $106,900 - $184,300 (USD) Full time1 week ago
-
+Job Summary · As a Senior Reliability Engineer in Medtronic's Diabetes business, you will lead product verification and reliability test planning/designing/testing methods/equipment for infusion pump systems.+ · +ResponsibilitiesNegotiates within the business to improve overall ...
Los Angeles $106,400 - $159,600 (USD)2 weeks ago
-
+ We're looking for a scrappy, hands-on Test & Reliability Engineer to own testing from prototype to production. · ...
Los Angeles Metropolitan Area1 month ago
-
An exciting career awaits you · At MPC, we're committed to being a great place to work – one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment. · Instrument Reliability Engineer · Job Summary · The Marathon ...
Los Angeles Metropolitan Area $90,000 - $155,000 (USD) per year1 week ago
-
An exciting career awaits you · At MPC, we're committed to being a great place to work – one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment. · Instrument Reliability Engineer · Job Summary · The Marathon ...
Los Angeles $106,900 - $184,300 (USD)1 week ago
-
We anticipate the application window for this opening will close on - 23 Feb 2026. · At Medtronic you can begin a life-long career of exploration and innovation while helping champion healthcare access and equity for all. · ...
Los Angeles, CA2 weeks ago
-
· , the premier online service for consumers to locate, contact and verify people and businesses. Over the past couple of decades the Company has quietly become one of the largest owners of public records data in the country, distributing its products over a vast network of webs ...
California $115,000 - $185,000 (USD) per year4 days ago
-
· About the Role · We are seeking a highly skilled Site Reliability Engineer (SRE) to join our small but high-impact infrastructure team. This role is ideal for someone who thrives in fast-paced environments, enjoys wearing multiple hats, and can take full ownership of projects ...
California $115,000 - $185,000 (USD) per year4 days ago
-
We're looking for a passionate and experienced Site Reliability Engineer to join our team and play a crucial role in ensuring our cloud platform's security, · Reliability, · scales well.Assist in implementing and operating Microservices on Kubernetes cloud-based platforms. · Coll ...
Irvine $115,000 - $185,000 (USD) per year Full time1 week ago
-
+Northwood is building a global network of next-generation ground stations, and we're looking for a Fleet Reliability Engineer who is equal parts technical expert, field operator, and builder. · +Upgrade, troubleshoot, and maintain a growing network of antennas distributed across ...
Los Angeles, CA2 weeks ago
-
About Vital Lyfe · Vital Lyfe is a tech company redefining water autonomy through innovation — creating a new category of personal water-making technology built to scale where infrastructure can't. · Mission · We're looking for a scrappy, hands-on Test & Reliability Engineer to o ...
Los Angeles $70,000 - $135,000 (USD) per year1 week ago
-
WHAT YOU'LL DO · We are looking for a skilled and motivated Database Reliability Engineer to join our growing team. In this role, you will support the design, implementation, and day-to-day operations of our database infrastructure across cloud platforms including AWS and Google ...
Los Angeles $130,000 - $150,000 (USD) Full time1 day ago
-
We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. · Participate i ...
Los Angeles, CA1 month ago
Senior Site Reliability Engineer - Los Angeles - Mango
Description
We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.
About Mango, Inc.
Mango is a new type of microscope for rapid bioburden testing.
Description
We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.
You will be the go-to expert for diagnosing performance issues in our on-prem system. This could be from kernel-level I/O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.
Key Responsibilities
Infrastructure Design & Reliability
Design, deploy, and maintain our on-premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high-throughput data processing clusters. Implement fault-tolerant systems with reproducible deployments and clear observability.
Performance & Systems Analysis
Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in-house application-level metrics to uncover root causes in filesystems, caching layers, or I/O scheduling.
Automation & Tooling
Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.
Collaboration
Work closely with our software and hardware teams to co-design systems that meet the needs of high-resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.
Observability & Incident Response
Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post-mortem efforts with a focus on learning and prevention.
Documentation & Communication
Produce clear documentation and communicate findings effectively to the broader team - from network topology diagrams to kernel tuning rationales.
General Qualifications
-
Reliability Engineer
Contract Only for registered members Philadelphia
-
Reliability Engineer
Only for registered members Morristown
-
Site Reliability Engineer
Only for registered members Los Angeles
-
Site Reliability Engineer
Full time Only for registered members Los Angeles
-
Site Reliability Engineer
Only for registered members Los Angeles
-
Senior Reliability Engineer
Full time Only for registered members Los Angeles
-
Senior Reliability Engineer
Only for registered members Los Angeles
-
Site Reliability Engineer
Only for registered members Los Angeles, CA
-
Instrument Reliability Engineer
Full time Only for registered members Los Angeles
-
Senior Reliability Engineer
Only for registered members Los Angeles
-
Test & Reliability Engineer
Only for registered members Los Angeles Metropolitan Area
-
Instrument Reliability Engineer
Only for registered members Los Angeles Metropolitan Area
-
Instrument Reliability Engineer
Only for registered members Los Angeles
-
Senior Reliability Engineer
Only for registered members Los Angeles, CA
-
Site Reliability Engineer
Only for registered members California
-
Site Reliability Engineer
Only for registered members California
-
Site Reliability Engineer
Full time Only for registered members Irvine
-
Fleet Reliability Engineer
Only for registered members Los Angeles, CA
-
Test & Reliability Engineer
Only for registered members Los Angeles
-
Database Reliability Engineer
Full time Only for registered members Los Angeles
-
Site Reliability Engineer
Only for registered members Los Angeles, CA