- Ensure the reliability, availability, and performance of applications and services, focusing on minimizing downtime, optimizing response times, and maintaining high availability for users.
- Lead incident response efforts for incidents, including identification, triage, resolution, and post-incident analysis to prevent recurrence and improve system resilience.
- Develop and maintain monitoring solutions and alerting mechanisms for infrastructure, application performance, and user experience metrics, enabling proactive issue detection and mitigation.
- Implement automation tools and processes to automate routine tasks, scale infrastructure, and ensure seamless deployments, updates, and rollbacks with minimal user impact.
- Conduct capacity planning, performance tuning, and resource optimization for environments, collaborating with development and operations teams to meet scalability and performance goals.
- Collaborate with security teams to implement security best practices, perform vulnerability assessments, and ensure compliance with security standards and regulatory requirements for applications.
- Manage deployment pipelines, release processes, and configuration management for app deployments, ensuring consistency, reliability, and version control across environments.
- Identify areas for improvement in reliability, performance, and efficiency through data analysis, root cause analysis, and trend analysis, and drive initiatives to enhance system reliability and operational efficiency.
- Create and maintain documentation, runbooks, and knowledge base articles for operational procedures, troubleshooting guides, and best practices, and promote knowledge sharing within the team.
- Develop and test disaster recovery plans, backup strategies, and failover mechanisms for app services, ensuring business continuity and data integrity in case of failures or disasters.
- Collaborate with development, QA, DevOps, and product teams to ensure alignment on reliability goals, performance metrics, release schedules, and incident response processes.
- Participate in on-call rotations and provide 24/7 support for critical incidents, troubleshoot issues, and coordinate with teams for resolution, escalation, and follow-up actions as per defined SLAs.
- Proficient in development technologies, architectures, and platforms (web, api) to understand system complexities and performance considerations.
- Experience in cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure as code (IaC) tools for managing app infrastructure and deployments.
- Knowledge of monitoring tools (e.g., Prometheus, Grafana, New Relic) and logging frameworks (e.g., ELK Stack) for real-time visibility into system health, performance metrics, and user experience.
- Experience in incident management, including incident response, triage, root cause analysis (RCA), and post-mortem reviews to prevent recurring issues.
- Strong troubleshooting skills to diagnose complex technical issues in app environments, infrastructure, networking, and performance bottlenecks.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Ansible, Terraform) for automating routine tasks, deployments, and infrastructure management.
- Experience in implementing continuous integration/continuous deployment (CI/CD) pipelines for apps using tools like Jenkins, GitLab CI/CD, or Azure DevOps.
- Expertise in setting up monitoring solutions, configuring alerts, and creating dashboards to monitor system performance, application metrics, and user experience.
- Familiarity with APM (Application Performance Monitoring) tools to analyze app performance, identify bottlenecks, and optimize resource utilization.
- Commitment to continuous learning, staying updated with industry trends, new technologies, and best practices in app reliability, performance, and operations.
- Adaptability to evolving requirements, technologies, and business needs, with a focus on driving continuous improvement and operational excellence.
-
Reliability Engineer
2 weeks ago
Adecco Permanent Recruitment West Virginia, United StatesWe have partnered with our client, a leading global metals manufacturer with multiple capital investment projects, to recruit for a Reliability Engineer. Our client has been named as one of Forbes Magazine Best Places for Women as well as one of Time Magazine's Best workplaces · ...
-
Reliability Engineer
1 day ago
WestRock Virginia, United States Full timeDescription & Requirements · WestRock (NYSE :WRK) is a global leader in sustainable paper and packaging solutions. We are materials scientists, packaging designers, mechanical engineers and manufacturing experts with a shared purpose: Innovate Boldly. Package Sustainably. Guide ...
-
Maintenance/ Reliability Engineer
5 days ago
Management Business Solutions Charleston, United StatesOpportunity to join our Global Chemical Client as the Site Maintenance/ Reliability Engineer. Participate on the Engineering Team to maintain and upgrade the process, fixed and rotating equipment. Implement the MI program to ensure compliance with OSHA process safety standard. · ...
-
Reliability Engineer
3 weeks ago
Adecco Permanent Recruitment Charleston, United StatesWe have partnered with our client, a leading global metals manufacturer with multiple capital investment projects, to recruit for a Reliability Engineer. Our client has been named as one of Forbes Magazine Best Places for Women as well as one of Time Magazine's Best workplaces · ...
-
Digital Site Reliability Engineer
2 weeks ago
Ampcus Charleston, United StatesTitle: Digital Site Reliability Engineer · Location: Remote · Duration: Months contract · Role and Responsibilities · Reporting to the Head of DevOps Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solution ...
-
Reliability Engineer
2 weeks ago
Goodyear Dunlop Tires Germany GmbH Charleston, United StatesPress Tab to Move to Skip to Content Link · Select how often (in days) to receive an alert: · Select how often (in days) to receive an alert: · RELIABILITY ENGINEER FOR OUR GOODYEAR MERCURY PLANT-DUDELANGE (M/F) · The opportunity · You will have the opportunity to join a multinat ...
-
Reliability Automation Control Engineer
3 days ago
SHL Group Charleston, United States Full timeJob Overview · The Reliability Automation Controls Engineer position will be responsible for improving the effectiveness of the workforce by developing and implementing process improvement projects in operations & facility equipment and developing standard processing procedures ...
-
Digital Site Reliability Engineer
2 weeks ago
TalentBurst Charleston, United StatesDigital Site Reliability Engineer · REMOTE · Duration: 12+ Months · Role and Responsibilities · Reporting to the Head of DevOps Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions business. In this role, ...
-
Body Shop Maintenance Reliability Engineer
2 weeks ago
Mercedes-Benz Vans, LLC Charleston, United StatesBody Shop Maintenance Reliability Engineer · 1 week ago · Be among the first 25 applicants · Aufgaben · Shape the future today. · The world is changing. The question is, what will be our contribution to the outcome? We have set the pace in the field of mobility from the very ...
-
Principal Site Reliability Operations Engineer
2 weeks ago
Roblox Charleston, United States"BUILD" YOUR CAREER WITH TUFF SHED · We're recruiting for an experienced · Field Construction Technician - · DOT · to be part of our team providing high-quality Tuff Shed products at our · Malvern · location. · DO YOU HAVE WHAT IT TAKES TO BE A SUCCESSFUL · FIELD CONSTRUC ...
-
Electrical and Instrumentation Engineer
2 weeks ago
Allied Resources Technical Consultants Virginia, United StatesElectrical & Instrumentation Engineer · Allied Resources is partnered with an established Chemical Manufacturing company that is seeking an E&I Engineer to provide technical support for Reliability and Maintenance projects in a Chemical plant-based environment We are looking for ...
-
Engineering Manager
2 weeks ago
MRCC West Virginia, United StatesPrimary Purpose: · Safely execute projects on schedule and on budget · Develop and maintain the overall APU capital plan with assistance from various plant & APU resources. · Maintain the APU capital project portfolio · Ensure improvement activities are completed for APU key cost ...
-
Oracle DBA
1 week ago
CRESCENT SOLUTIONS Virginia, United StatesMust be a US Citizen. Work must be done onsite. · Oracle DBA to assist the engineers to design, test, and upgrade and deploy various Oracle related technology, include but not limited to Oracle Enterprise Manager (OEM) 13c and Oracle Golden Gate/Active Data Guard. The Oracle DBA ...
-
Associate or Mid-Level Equipment
1 week ago
Apex Systems North Charleston, United States Full timeJob#: · Job Description: · Apex Systems is hiring for an Associate or Mid-Level Equipment & Tool Engineer in North Charleston, SC to support Boeing. · Job Description Qualifications: · Boeing Commercial Airplanes is seeking highly motivated, organized and strategic individua ...
-
Support the PEO Digital Force Pilots Program
4 days ago
SAIC North Charleston, United StatesDescription · SAIC is seeking external talent for a future program for PEO Digital Force Pilots in our Naval Information Warfare Center (NIWC) Atlantic division. · The PEO Digital Force Pilots Program will assist PEO Digital and DoN portfolios digitally transform their enterpris ...
-
Director, Maintenance and Reliability
1 week ago
Wacker Charleston, United StatesAre you looking for a new challenge? For our location Charleston we are looking for a(n) Director, Maintenance and Reliability . This is Us: · You know us. You just didn't know it. WACKER is a global chemical manufacture that prides itself on "Creating Tomorrow's Solutions". ...
-
Senior Software Engineer
3 weeks ago
YourCause Charleston, United StatesAbout the role · We are looking for a Senior back-end Engineer to join our UK Engineering team. Our back-end engineers work in multi-disciplinary teams with other back-end and front-end engineers, DevOps, product managers, UX and designers to build innovative products that are ...
-
Senior Infrastructure Engineer
4 days ago
Crunchbase Virginia, United StatesAbout Crunchbase · Crunchbase helps over 75 million people around the world connect with the companies and people that matter. Powered by best-in-class proprietary data, Crunchbase is democratizing access to opportunities so salespeople, entrepreneurs, investors, job seekers, a ...
-
Barton Malow Virginia, United States InternshipSHIPCompany: Barton Malow Builders · Job Location: Virginia · Position: Intern · REQ ID: 10699 · Barton Malow Company's LEAPS (LEArning Practicum for Students) summer program is the best real-life construction experience the industry has to offer. Established in 1984, students jo ...
-
Maintenance - Lodge Alley Inn
1 week ago
Bluegreen Vacations Charleston, United States Full timeJob Description · Lodge Alley Inn Resort, located in Charleston, SC is seeking a reliable Maintenance Tech. Hiring Immediately, apply today. · Compensation: $18.00-$22.00 per hour, dependent upon experience · Schedule: Full-Time/Part-Time, Flexible Shifts · Location: East B ...
Digital Site Reliability Engineer - Charleston, United States - Ampcus Incorporated
Description
Title: Digital Site Reliability EngineerLocation: Remote
Duration: 12 Months contract
Role and Responsibilities
Reporting to the Head of DevOps Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions business. In this role, the candidate will have the opportunity to make a lasting impact on the company's digital transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive digital banking landscape. Specifically, the Site Reliability Engineer will be responsible for the following: