Site Reliability Engineer - Cleveland - Ellofant

    Ellofant
    Ellofant Cleveland

    1 week ago

    Description
    Contract: 6-month initial term with conversion potential

    Work Authorization:
    Must be authorized to work in the U.S. without sponsorship requirements

    Ellofant is a modern consulting firm built for those who want to do work that actually moves the needle. We help companies navigate change, complexity, and scale through a blend of strategic thinking, trusted technology, and hands-on execution.

    Our clients rely on us not just for advice, but for building systems, launching products, and driving outcomes that matter.

    At Ellofant, we value clarity over jargon, momentum over perfection, and people over process. We're looking for curious, driven individuals who want to solve real problems with real impact.

    If you're excited about challenging what's possible and delivering meaningful change while you're at it, Ellofant might be your next move.

    About the Role
    We're seeking an experienced Site Reliability Engineer to join our infrastructure resiliency team.

    In this role, you'll be responsible for ensuring the stability, performance, and reliability of critical systems across diverse technology stacks including mainframe, Windows, and cloud environments (OpenShift, AWS, Azure).

    You'll work at the intersection of software engineering and operations, driving automation, implementing resiliency patterns, and responding to critical events to maintain exceptional service availability.

    What You'll Be Doing

    • Coordinate responses to critical events with application support teams and the Site Reliability Center
    • Triage and respond to alerts generated through BigPanda event correlation platform
    • Assess cross-domain impacts and engage appropriate support teams or escalate as needed
    • Participate in on-call rotations to provide 24x7 coverage for critical systems
    • Conduct blameless post-mortems and root cause analysis to drive continuous improvement
    • Design and implement automated monitoring and alerting systems using Dynatrace, Grafana, Logscale, CrowdStrike, Prometheus, Splunk, Moogsoft, and Datadog
    • Create robust dashboards and implement SLAs/SLOs through comprehensive monitoring
    • Analyze metrics from operating systems and applications to assist in performance tuning and fault detection
    • Develop and implement chaos engineering practices using Litmus, Gremlin, Azure Chaos Studio, and Chaos Mesh
    • Design fault injection experiments to validate system resilience using AWS Resilience Hub
    • Build self-healing capabilities and automated remediation workflows
    • Implement health checks, autoscaling solutions using AWS Lambda, Kubernetes, OpenShift, and Istio service mesh
    • Manage infrastructure across mainframe systems, Windows, RHEL, and cloud platforms (AWS, Azure, OpenShift)
    • Work with containerized environments, event streaming platforms (Kafka), and database systems (Oracle, SQL)
    • Maintain virtualization infrastructure (VMware) and storage systems (NAS)
    • Leverage ServiceNow for incident management, Jira for issue tracking, and CA7 for job scheduling
    • Identify opportunities to improve application stability and evangelize SRE best practices
    • Maintain comprehensive knowledge bases and runbooks in Confluence
    • Mentor junior team members on resiliency patterns and operational excellence
    What We're Looking For
    1. 3-5 years of relevant experience in site reliability, infrastructure, or DevOps engineering

    • Strong expertise in monitoring and observability tools (Dynatrace, Grafana, Prometheus, Splunk, or similar)
    • Experience with incident management and event correlation platforms (BigPanda, ServiceNow, Moogsoft)
    • Proficiency with Linux/Unix systems (RHEL) and Windows Server environments
    • Hands-on experience with cloud platforms: AWS, Azure, or OpenShift
    • Strong knowledge of containerization and orchestration: Kubernetes, Docker, OpenShift
    • Experience with chaos engineering and fault injection frameworks (Litmus, Gremlin, AWS FIS, Azure Chaos Studio)
    • Solid understanding of networking, database systems (Oracle, SQL), and distributed architectures
    • Experience with event streaming platforms (Kafka) and service mesh technologies (Istio)
    • Familiarity with mainframe systems and legacy infrastructure
    • Experience with infrastructure as code and automation tools
    • Knowledge of job scheduling systems (CA7 or similar) and middleware technologies
    • Proficiency with Jira, Confluence, and ITSM tools
    • Experience working in financial services or other highly regulated industries preferred

    Relevant certifications valued:
    AWS/Azure architecture, RHCE, VCP, Kubernetes (CKA/CKAD)

    • Strong analytical thinking, problem-solving abilities, and troubleshooting skills
    • Excellent written and verbal communication skills for cross-functional collaboration
    Our Commitment
    We're committed to building resilient systems that deliver exceptional reliability and performance at scale.

    You'll work with cutting-edge resiliency engineering tools and practices, collaborate with talented engineers across domains, and have opportunities to mentor junior team members while continuously advancing your own skills in this evolving field.

    Equal Employment Opportunity & Inclusivity
    Ellofant is proud to be an Equal Employment Opportunity Employer.

    We do not discriminate in employment on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, age, protected veteran status, genetic information, or any other characteristic protected by law.

    This policy applies to all aspects of employment, including recruitment, hiring, placement, promotion, compensation, benefits, training, termination, and other conditions of employment.

    Commitment to Diversity, Equity & Inclusion
    We believe diversity, equity, and inclusion are fundamental to innovation, thoughtful client service, and a thriving workplace culture.

    We welcome and value different perspectives, experiences, and backgrounds including but not limited to race, gender, ethnicity, sexual orientation, disability status, veteran status, and neurodiversity.

    Accommodations are available upon request during the application and interview process.

    Pay Transparency & Benefits
    Compensation will be commensurate with your experience, skill set, and job location.

    Ellofant offers a competitive benefits package, which may include medical and dental coverage, retirement savings plans, paid time off, and professional development support.

    Salary range will be disclosed to candidates as part of the interview process where permitted.

    Application Privacy & Process

    Your personal information, including resume, interview feedback, and any background checks, will be collected and used solely for recruitment purposes and handled in accordance with applicable privacy laws.

    Employment is contingent upon successful completion of verification of eligibility and reference checks. You must be authorized to work in the U.S.

    Fraud Awareness
    Please be aware of potential recruitment fraud. Ellofant will never request payment or sensitive financial information at any stage of the hiring process. All legitimate communication will come from ellofant. If you receive suspicious outreach claiming to be from Ellofant, please contact us directly.


    We appreciate your interest in Ellofant and encourage candidates from all backgrounds who are excited about challenging what's possible to apply.


    Cleveland has reinvented itself as a technology and innovation center, leveraging its strong industrial heritage and world-class healthcare institutions to create diverse opportunities for technology professionals.

    The city's affordability is exceptional among major metropolitan areas, with housing costs that allow engineers to purchase substantial homes, invest in their futures, and enjoy financial flexibility rarely possible in coastal tech hubs.

    Located on the shores of Lake Erie, Cleveland offers unique recreational opportunities including boating, beaches, and waterfront trails that provide year-round outdoor activities and stunning natural beauty.

    The city's cultural institutions, including the Cleveland Museum of Art, Rock and Roll Hall of Fame, and Playhouse Square theater district, rival those of much larger cities while remaining accessible and affordable.

    Cleveland's food scene has experienced a renaissance, with innovative restaurants, craft breweries, and historic neighborhoods like Little Italy and Tremont offering diverse dining and entertainment options.

    The city's location in the Eastern time zone and proximity to major business centers makes it ideal for professionals working with teams and clients across North America and Europe.


  • Only for registered members Cleveland

    The Test Facilities Operation, Maintenance and Engineering (TFOME) contract at the NASA Glenn Research Center provides engineering and technical services in the aeropropulsion and space propulsion and power test facilities. · We are currently seeking a Maintenance and Reliability ...

  • Only for registered members Cleveland

    We are looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute to our mission of transforming business processes through technology. · This is a fantastic opportunity to join an established and well-respected organization offering tremendous ...

  • Only for registered members Cleveland, OH

    The Test Facilities Operation, Maintenance and Engineering (TFOME) contract at the NASA Glenn Research Center provides engineering and technical services in the aeropropulsion and space propulsion and power test facilities. · TFOEM management is dedicated to excellence in service ...

  • Only for registered members Cleveland

    Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge reliability engineering practices to build scalable, secure, and highly a ...

  • Only for registered members Garland

    We are dedicated to inspiring and enhancing the world by beautifying and protecting what matters. · ...

  • Only for registered members Cleveland Full time $81,000 - $90,000 (USD)

    We're looking for driven individuals to join our team as Integration Platform Reliability Engineer. You will play a critical role in the success of the company by developing and supporting the platform(s) used for integrations. · ...

  • Only for registered members Euclid Full time $68,300 - $109,300 (USD)

    A highly skilled and analytical engineer instrumental in advancing production equipment reliability and operational excellence across manufacturing systems ensuring optimal performance reduced downtime and sustained efficiency through data driven maintenance strategies and engine ...

  • Only for registered members Cleveland

    The Federal Reserve has developed a new interbank real-time gross settlement service called the FedNow Service. · This service enables financial institutions to provide their customers with the ability to send and receive payments any time, · any day, · and have full access to th ...

  • Only for registered members Cleveland, OH

    We're laser focused on helping the world's largest and most trusted auto manufacturers grow their tire business—in fact, we've sold more than 60 million tires to date. · ...

  • Only for registered members Cleveland, OH

    We're Dealer Tire,a family-owned,international distributor of tires and parts established in 1918 in Cleveland ,OH.We're laser focused on helping the world's largest and most trusted auto manufacturers grow their tire business,in fact we've sold more than 60 million tires to date ...

  • Only for registered members Euclid $68,300 - $109,300 (USD)

    A highly skilled and analytical engineer instrumental in advancing production equipment reliability and operational excellence across manufacturing systems ensuring optimal performance reduced downtime and sustained efficiency through data-driven maintenance strategies and engine ...

  • Only for registered members Cleveland, OH

    The Test Facilities Operation, Maintenance and Engineering (TFOME) contract at the NASA Glenn Research Center provides engineering and technical services in the aeropropulsion and space propulsion and power test facilities. · We are currently seeking a Maintenance and Reliability ...

  • Only for registered members Akron Full time $82,025 - $142,200 (USD)

    This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. The selected candidate may be offered a position within the same job family, at a level that is lower than what was posted. · The Asset Management Support team provides support to the Asset a ...

  • Only for registered members Akron $82,025 - $142,200 (USD)

    This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. · The selected candidate may be offered a position within the same job family, · at a level that is lower than what was posted.This will be determined based upon the results of the interview ...

  • Only for registered members Akron, OH

    This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. · The selected candidate may be offered a position within the same job family, at a level that is lower than what was posted. · The reporting location can be any of the following FE facilitie ...

  • Only for registered members Brook Park

    We are seeking a Powertrain Quality & Reliability Engineer to join a leading automotive manufacturing environment.This role is ideal for an engineer with strong supplier quality experience, powertrain knowledge, and the ability to drive process improvements with external supplier ...

  • Only for registered members Brook Park, OH

    We are seeking a Powertrain Quality & Reliability Engineer to join a leading automotive manufacturing environment. · ...

  • Only for registered members Strongsville

    We strive to be a company where we all can do our best work with a true sense of purpose and belonging.The primary role of the Reliability Engineer is to improve asset availability. · ...

  • Only for registered members Mentor $78,200 - $101,200 (USD)

    The Mechanical Engineer · (Days) – Maintenance provides production with engineering support for current products. · This role will have a heavy focus on equipment reliability and maintenance with a focus on CMMS (Computer Maintenance Management System) tools and utilization.Eval ...

  • Only for registered members Strongsville Full time

    Job summary · A senior reliability engineer is needed to improve asset availability in a global organization with over 75 years of experience. The role involves utilizing equipment history and statistical methods to drive out repetitive equipment failures. · Comprehensive package ...

  • Only for registered members Strongsville, OH

    We are driven by our core values of Quality, Integrity, Respect for the Individual, Customer Focus, Innovation and Continuous Improvement. Our products have been used in space and underwater. We strive to be a company where we all can do our best work with a true sense of purpose ...

Jobs
>
Site reliability engineer
>
Jobs for Site reliability engineer in Cleveland