- Resiliency &
- Cross-
- Cloud Environments: Manage and support systems running in AWS, Azure, and GCP, ensuring seamless integration and optimal performance across different cloud platforms.
- Monitoring & Observability: Lead the development of advanced monitoring strategies, leveraging tools like Dynatrace, Logscale, Prometheus, Grafana, Splunk, New Relic, Datadog, AWS Lambda, AWS Autoscaling, Kubernetes, OpenShift, Istio, AppDynamics, Nagios, Zabbix, and Elastic Stack (ELK) to proactively detect and address potential issues before they impact production.
- Chaos Engineering: Implement chaos engineering practices using tools like IBM Storage Defender Data Resiliency Service, Gremlin, Chaos Monkey, Litmus, Simian Army, AWS Resiliency Hub, Azure Chaos Studio, Chaos Mesh and other frameworks (both commercial and open source) to inject failure and test system resilience in real-world conditions, driving continuous improvements in fault tolerance.
- Ecosystem Management: Oversee and manage a complex ecosystem that spans cloud services, on-prem systems, virtualization, containers, and mainframe, ensuring all components work cohesively to support highly available and scalable environments.
- Capacity Planning & Scalability: Plan for and manage the scaling of systems to accommodate increasing traffic, data, and user demands, ensuring consistent performance without sacrificing availability. Tools like Kubernetes, Helm, Docker, Terraform, and CloudFormation are key.
- Incident Response & Root Cause Analysis (RCA): Lead incident response efforts, including root cause analysis, post-mortems, and implementation of corrective actions. Collaborate across teams to ensure that resiliency improvements are fully integrated into future system deployments.
- Automation: Develop and implement automation for system management, including infrastructure-as-code using tools like Terraform, Ansible, Chef, Puppet, SaltStack, CloudFormation, and Kubernetes Operators. Ensure consistent deployment and maintenance of infrastructure across environments.
- Security & Compliance: Work with security teams to ensure that all systems meet security standards and are compliant with industry regulations, proactively mitigating risks associated with system vulnerabilities. Familiarity with security tools like HashiCorp Vault, Open Policy Agent, Aqua Security, and Twistlock is a plus.
- Client Collaboration: Work closely with internal teams and our financial services client to define resiliency requirements, address client needs, and ensure the operational excellence of mission-critical systems.
- On-
- Experience: At least 5 years of hands-on experience in Systems Reliability Engineering, Resiliency Engineering, or related roles, particularly in highly regulated, mission-critical environments like financial services.
- Cross-
- Monitoring &
- Cloud &
- Chaos Engineering: Proven experience implementing chaos engineering practices, using tools like IBM Storage Defender Data Resiliency Service, Gremlin, Chaos Monkey, Litmus, Simian Army, AWS Resiliency Hub, Azure Chaos Studio, Chaos Mesh and other frameworks to test and validate system resilience.
- Automation & Infrastructure as Code: Strong background in infrastructure-as-code tools such as Terraform, Ansible, Chef, Puppet, SaltStack, and CloudFormation. Proficiency in automating cloud infrastructure, system provisioning, and configuration management.
- Incident Management: Proven track record in managing high-severity incidents, performing root cause analysis (RCA), and implementing long-term solutions to improve system reliability and prevent recurrence.
- Security & Compliance: Knowledge of industry-specific security and compliance standards, particularly in financial services, and experience implementing and maintaining secure systems. Familiarity with HashiCorp Vault, Aqua Security, and Twistlock.
- Programming & Scripting: Proficiency in Go, Python, Bash, or similar programming/scripting languages for automation, system management, and performance tuning.
- Collaboration Skills: Ability to effectively collaborate with clients, development teams, and operations teams to drive operational improvements and ensure a shared understanding of resiliency requirements.
- Mainframe Systems: Familiarity with mainframe environments and integration with modern technologies.
- Disaster Recovery &
- Certifications: Industry certifications such as AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), Linux Professional Institute Certification, or other relevant credentials are desirable.
- Passionate Technologist: You stay up-to-date with the latest developments in both commercial and open-source tools, and often spend your evenings or weekends experimenting with new technologies, sharing knowledge, and contributing to the broader technical community.
- Innovative Mindset: You bring fresh ideas to the table, are always looking for better ways to do things, and have a proven track record of driving improvements in system design and resiliency.
- Tool Expertise: Hands-on experience with cutting-edge tools and technologies that improve automation, monitoring, and system performance, including Helm, Envoy, Consul, Istio, Traefik, Prometheus Operator, Service Mesh, and Kong.
- Demo &
- Addison, a small but prosperous city located north of Dallas, is known for its thriving business environment and high-quality living. It boasts a dynamic restaurant scene, particularly along the Belt Line, which is famous for its many dining and entertainment options. Addison hosts the annual Kaboom Town fireworks display, drawing thousands of visitors each year. The city is also home to major corporate offices, making it an attractive location for professionals. With its small-town charm and big-city amenities, Addison offers an excellent balance of work and play.
- Addison, a small but prosperous city located north of Dallas, is known for its thriving business environment and high-quality living. It boasts a dynamic restaurant scene, particularly along the Belt Line, which is famous for its many dining and entertainment options. Addison hosts the annual Kaboom Town fireworks display, drawing thousands of visitors each year. The city is also home to major corporate offices, making it an attractive location for professionals. With its small-town charm and big-city amenities, Addison offers an excellent balance of work and play.
-
Principal Cloud Engineer
1 week ago
Motion Recruitment AddisonLead Cloud Engineer - Azure Migration · We are undergoing an exciting transformation, migrating to the Azure cloud and preparing for significant local growth. · About the Role · You will be responsible for developing scalable infrastructure-as-code strategies and maintaining orga ...
-
Director of Supply Chain
1 day ago
Avance Consulting Addison Full timeJob Description · Job Description · Job Title: Director of Supply Chain · Location: Addison, IL · Reports to: President · Company Overview: · We are a leading manufacturer of intelligent metal processing equipment. Our core products include laser cutting machines, press brakes, l ...
-
Pilot, Second in Command
1 day ago
Four Corners Aviation Addison Full timeJob Description · Job Description · Welcome to Four Corners, where we are redefining private aviation. As a dynamic new force in the industry backed by deep aviation roots, we offer a comprehensive suite of services, including Aircraft Management, Charter, and our innovative Free ...
-
Level 2 Lift Maintenance Mechanic
1 day ago
Sugarbush Resort Starksboro Part time/Full timeSugarbush Resort Overview · Sugarbush Resort is a year-round destination offering some of the best skiing and riding in the east, an award-winning ski school, mountain biking, golf, disc golf, an array of dining venues, and a fully appointed Health and Recreation Center. · Benefi ...
-
Lift Maintenance Supervisor
5 days ago
Sugarbush Resort Starksboro Full timeLift Maintenance Supervisor at Sugarbush Resort · The Lift Maintenance Supervisor is responsible for ensuring the safety and reliability of lift operations through maintenance and repair of equipment. This role involves supervising a team, planning maintenance schedules, and perf ...
-
Level 2 Lift Maintenance Mechanic
3 days ago
Sugarbush Resort Starksboro Full timeJob Title: Level 2 Lift Maintenance Mechanic · The Level 2 Lift Maintenance Mechanic is responsible for the maintenance and repair of lift equipment to ensure safety and reliability. This role involves performing inspections, troubleshooting breakdowns, and working on various mec ...
-
Level 2 Lift Maintenance Mechanic
5 days ago
Sugarbush Resort StarksboroSugarbush Resort Overview · Sugarbush Resort is a year-round destination offering some of the best skiing and riding in the east, an award-winning ski school, mountain biking, golf, disc golf, an array of dining venues, and a fully appointed Health and Recreation Center. · Beside ...
-
Lift Maintenance Supervisor
5 days ago
Sugarbush Resort StarksboroSugarbush Resort Overview · Sugarbush Resort is a year-round destination offering some of the best skiing and riding in the east, an award-winning ski school, mountain biking, golf, disc golf, an array of dining venues, and a fully appointed Health and Recreation Center. · Benefi ...
-
diesel mechanic
1 day ago
Student Transportation of Vermont, Inc. Middlebury (village) Full timeJob Description · Job Description · Job Type: Full-Time · Schedule: Monday-Friday · Location: Bet-cha Transit Rutland Terminal 132 Randbury Rd Rutland, VT. · Hours: 40 Hours (Opportunity for overtime) · Pay Range: $24-28 Depending on experience. · Bet-Cha Transit is a locally-o ...
-
Manager, Platform Engineering
5 days ago
Data Innovations ColchesterThe Platform Engineering Manager role at data-innovations requires a hands-on approach to building cloud-native infrastructure while laying the groundwork for a future team. · This critical role will establish and maintain CI/CD pipelines to increase development velocity and stan ...
-
Maintenance and Engineering Manager
2 days ago
HiFyve Burlington Full timeJob Description · Job Description · HiFyve is hiring a Maintenance and Engineering Manager based out of Burlington, WA · Salary:$115k to $130K Annually + Bonus Opportunities · Schedule:Days - with on-call availability for emergencies · Location:Burlington, WA (Relocation Assistan ...
-
Corporate Maintenance
4 days ago
Cypress HCM Burlington Full timeJob Description · Job Description · Corporate Maintenance & Engineering Manager · This exciting role will be responsible for overseeing the engineering and maintenance functions across multiple locations. Your primary responsibility will be to ensure the effective planning, coord ...
-
Staff Engineer
21 hours ago
The CARIAN Group Burlington CountyJob Description: · We are a nationally certified woman-owned management consulting firm providing program/project management, capital project delivery, consulting, and technology solutions to our customers in the renewable energy, utilities, infrastructure, and transit/transporta ...
-
staff engineer
11 hours ago
The CARIAN Group Burlington County None***** Southern New Jersey/Philadelphia areasFull-time in officeCARIAN is growing We are a proud, nationally certified woman-owned management consulting firm providing program/project management, capital project delivery, consulting, and technology solutions to our customers in th ...
-
Staff Engineer
3 hours ago
The CARIAN Group Burlington County**Job Title:** Staff Engineer · **Location:** Southern New Jersey/Philadelphia areas, Full-time in office · CARIAN is a nationally certified woman-owned management consulting firm providing program/project management, capital project delivery, consulting, and technology solutions ...
-
Sr Electrical Engineer
10 hours ago
General Dynamics ColchesterKey Responsibilities of this Position · Sr Electrical Engineer · Location: US-VT-Colchester · Job ID: 2025-33692 · Type: Full Time · # of Openings: 1 · Category: Engineering · About the Opportunity · General Dynamics Ordnance and Tactical Systems is a leading manufacturer of high ...
-
2nd Shift
4 days ago
BlackTree Technical Group, Inc. Burlington Full timeJob Description · Job Description · Manufacturing Engineer Ill on-site on the second shift (3:00 PM – 11:00 PM - This includes a 10% shift differential). : · Writing and updating the support documentation (APs, TPs, and OIs) for the process, developing, writing and performing IQ, ...
-
Level 2 Lift Maintenance Mechanic
3 days ago
Sugarbush Resort WaterburySugarbush Resort Overview · Sugarbush Resort is a year-round destination offering exceptional skiing and riding, an award-winning ski school, mountain biking, golf, disc golf, various dining venues, and a fully equipped Health and Recreation Center. · We are seeking skilled profe ...
-
Hotel Maintenance Engineer
6 days ago
DoubleTree Hotel- Burlington, VT South Burlington Full timeJob Description · Job Description · Signing Bonus $400.00 after 90 days · The Doubletree by Hilton Burlington is Vermont's largest hotel and conference center. Known as a premier location for meetings, events and travelers, the Doubletree offers guests a quality experience when v ...
-
Senior/Process Engineer
1 week ago
SiPhox Health BurlingtonAbout Siphox-Health: · Siphox-Health, supported by leading venture capital firms, is revolutionizing diagnostics through silicon photonics. Our in-vitro diagnostic (IVD) devices, powered by disposable silicon photonic chip sensors, deliver the capabilities of large laboratory ins ...
Systems Reliability Engineer - Addison - Ellofant Consulting

Description
Please read before applying:
This position is open to U.S. Citizens and Permanent Residents (Green Card holders). We are unable to provide visa sponsorship for this role.
Location:
USA, in person / travel to client site (not remote)
Job Type:
Full-time Industry: Leading Financial Services Organization / Technology Strategy Firm
Candidates are strongly encouraged to read the full job description before applying, as this role will involve detailed technical interviews to assess deep system knowledge and hands-on expertise.
Additionally, candidates who tailor their resume to highlight specific experiences related to the job description will stand out.
Generic resumes do not effectively represent the skills required for this role and are not helpful for either the candidate or the employer.
Overview:
We are seeking a highly skilled and proactive Systems Reliability Engineer (SRE) / Resiliency Engineer to join our team in support of a leading financial services client.
As a leading technology strategy firm, we are dedicated to helping our client improve their product and platform resiliency, ensuring their mission-critical systems remain reliable and scalable.
This role will focus on designing, enhancing, and maintaining resilient and highly available infrastructures that support complex financial services platforms.
The ideal candidate will have extensive experience working across a variety of operating systems and cloud platforms, including Linux, Windows, OpenShift, AWS, Azure, GCP, and even legacy systems such as mainframe.
You'll be part of a collaborative team responsible for ensuring that systems can withstand failures, scale seamlessly with demand, and recover quickly from disruptions.
Additionally, this role may require on-call work to provide critical support based on client needs, ensuring operational continuity at all times.
Key Responsibilities:
Fault Tolerance:
Design, implement, and continuously improve the resiliency of production systems, platforms, and products across cloud and on-prem environments, ensuring maximum uptime and operational continuity.
Platform Support:
Maintain and optimize environments across Linux, Windows, VMs, and mainframe systems. Familiarity with OpenShift amongst other container orchestration platforms for robust, scalable deployments. Candidate will work in a diverse ecosystem and be required to take ownership across multiple technologies.
Call Support:
Participate in an on-call rotation to provide support outside regular working hours, ensuring that client-facing services remain available and performant as needed.
Key Qualifications:
Platform Management:
Deep expertise with Linux, Windows, BSD VMs, and mainframe systems. Familiarity with container orchestration platforms like OpenShift, Kubernetes, and Docker; experience managing large-scale cloud environments (AWS, Azure, GCP).
Observability Tools:
Proficiency with monitoring and observability platforms such as Prometheus, Grafana, Splunk, New Relic, Datadog, AppDynamics, Nagios, Zabbix, and Elastic Stack (ELK) and more.
Strong understanding of how to leverage logs, metrics, and traces for proactive issue resolution.Hybrid Infrastructure:
Extensive experience with AWS, Azure, and GCP, managing hybrid infrastructures that span cloud and on-prem environments. Experience with cloud-native services and containerized applications is a must.
Desired Skills:
Business Continuity:
Experience in designing and testing disaster recovery (DR) strategies to ensure that systems can rapidly recover from outages.
Nice to Haves:
Showcase Skills:
Candidates who can set up live demos or showcase their expertise with practical demonstrations during the interview process will be strongly preferred.
This demonstrates hands-on experience and technical proficiency in real-world scenarios.Why Join Us?
As a leading technology strategy firm, we are proud to support our financial services client in enhancing their platform and product resiliency.
Join our team to work on cutting-edge technologies, drive innovation in infrastructure management, and make a real impact on the availability and scalability of critical financial services.
You'll be part of a collaborative team that values continuous learning and offers ample opportunities for professional growth.City Information:
Addison, TX
-
Principal Cloud Engineer
Motion Recruitment- Addison
-
Director of Supply Chain
Full time Avance Consulting- Addison
-
Pilot, Second in Command
Full time Four Corners Aviation- Addison
-
Level 2 Lift Maintenance Mechanic
Part time/Full time Sugarbush Resort- Starksboro
-
Lift Maintenance Supervisor
Full time Sugarbush Resort- Starksboro
-
Level 2 Lift Maintenance Mechanic
Full time Sugarbush Resort- Starksboro
-
Level 2 Lift Maintenance Mechanic
Sugarbush Resort- Starksboro
-
Lift Maintenance Supervisor
Sugarbush Resort- Starksboro
-
diesel mechanic
Full time Student Transportation of Vermont, Inc.- Middlebury (village)
-
Manager, Platform Engineering
Data Innovations- Colchester
-
Maintenance and Engineering Manager
Full time HiFyve- Burlington
-
Corporate Maintenance
Full time Cypress HCM- Burlington
-
Staff Engineer
The CARIAN Group- Burlington County
-
staff engineer
None The CARIAN Group- Burlington County
-
Staff Engineer
The CARIAN Group- Burlington County
-
Sr Electrical Engineer
General Dynamics- Colchester
-
2nd Shift
Full time BlackTree Technical Group, Inc.- Burlington
-
Level 2 Lift Maintenance Mechanic
Sugarbush Resort- Waterbury
-
Hotel Maintenance Engineer
Full time DoubleTree Hotel- Burlington, VT- South Burlington
-
Senior/Process Engineer
SiPhox Health- Burlington