- Must experience serving as a SRE
- Prior leadership and experience with leading a team
- Deep understanding of SRE principles for highly scalable and reliable systems.
- Configuration Management and Infrastructure as Code expertise
- Responsible for incident response, monitoring, alerting, triaging and closing of real problems
- Ensure platform stability and availability
- Responsible for the metrics reporting and tracking, evaluation of proper function, support to the teams for enhance performance
- Design and implement end-to-end continuous delivery pipelines.
- Leverage extensive AWS cloud experience in a production environment (e.g., network, security, deployment, automation, serverless technologies).
- Utilize a deep understanding of SRE principles for highly scalable and reliable systems.
- Leverage extensive experience with Configuration Management and Infrastructure as Code.
- Works with application teams to document application internal/external interface requirements for Development, Testing, Staging and Production environments
- Works with application teams to ensure compliance with High Availability and Disaster Recovery related concept of operations.
- Build service level requirements for SLA's
- Implements middleware application specific requirements as needed
- Implements migration efforts with application teams, including data migration
- Serve as a thought leader for agile development teams.
- Establish clarity of direction and a shared vision of success that is championed by team members, stakeholders, and product owners.
- Build relationships, and work in collaboration with team members, stakeholders, product owners, and technical team leads.
- Help enhance processes, communication, and delivery through new norms that improve how work is done - from discovery to delivery.
- Provides technical guidance to application teams to take advantage of cloud technologies, and implement cloud infrastructure, as needed.
- 10+ years of software engineering and DevOps experience
- Bachelor degree or higher education required
- Must be able to obtain and maintain a Public Trust security clearance
- Must have experience with highly scalable and reliable systems by implementing and maintaining processes and tools
- Incident response, monitoring performance and releases, alerting, and triaging expertise
- ServiceNow, AWS Insight, Splunk, VictorOPS, CloudWatch, New Relic, and Confluence expertise preferred
- Experience in designing and implementing end-to-end continuous delivery pipelines.
- A deep AWS cloud experience in a production environment (e.g., network, security, deployment, automation, serverless technologies).
- Experience and understanding in SRE principles for highly scalable and reliable systems.
- A strong experience with Configuration Management and Infrastructure as a Code.
- Experience designing and implementing end to end CI/CD pipelines
- AWS Cloud experience in the production environment (ie. network, security, deployment, automation, serverless technologies)
- Experience designing and building web application environments on AWS including services such as EC2, S3, Lambda, ELB, ECS etc.
- Experience in deploying of the cloud resources using IaC tools like Terraform.
- Experience with monitoring and logging tools such as Cloud Watch, App Dynamics and Splunk. Create CloudWatch rules to capture the apps alerts and send notifications
- Previous experience migrating application teams from on-prem to cloud infrastructure (AWS, Azure) preferred.
- Experience with CI/CD frameworks (ie. Jenkins, Docker, Ansible, Chef, Puppet, Git)
- Experience in at least one automation and scripting tool experience (ie. Bash, Python, Shell, Perl)
- Experience in designing and building of CIFS and NFS on-premises File share migration using AWS Datasync and VPC endpoints to AWS storage services S3, EFS or FSx.
- Experience in creating build plans for AWS deployment by listing out compute resources, Security groups, LB, target group, NACL and all other components for various environments (Dev, TQA, and Prod etc.)
- Experience maintaining and administering configuration management systems such as Enterprise GitHub.
- Experience maintaining and administering software build systems such as Jenkins.
- Experience maintaining and administering artifact repository systems such as Artifactory.
- Ability to automate workflows through scripting or other technologies such as Ansible or Puppet.
- Expertise in Agile and DevSecOps approaches
-
Reliability Engineer
1 week ago
Valvoline Global Washington, United StatesOverview · Why Valvoline Global Operations? · Valvoline Global is a worldwide leader in automotive and industrial solutions, creating future-ready products and best-in-class services for partners around the globe. Established in 1866, we introduced the world's first branded motor ...
-
Site Reliability Engineer
5 days ago
MetroStar Systems Washington, United StatesAs a Site Reliability Engineer (SRE), youll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the federal government. We know that you cant have great technology services without Reliability Engineer, Liability, ...
-
Site Reliability Engineer
1 week ago
System One Holdings, LLC Washington, United StatesSite Reliability Engineer · Washington, DC - 100% ONSITE · Active TS/SCI clearance is required to start · As a Site Reliability Engineer (SRE), you'll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the feder ...
-
Maintenance Reliability Engineer
1 week ago
Louis Dreyfus Company B.V. Washington, United StatesPort Allen, LA, United States of America · Job Reference · JR0073330 · Professional Areas · Industry · Function · Operations, Engineering and Maintenance · Louis Dreyfus Company is a leading merchant and processor of agricultural goods. Our activities span the entire value cha ...
-
Network Reliability Engineer
2 days ago
Expedient Washington, United StatesSponsorship is not provided. · Expedient is actively looking for a Network Reliability Engineer to join the Network Reliability team. Expedient is an established leader in the cloud computing industry. We are an award-winning employer of great talent and cutting edge emerging tec ...
-
Site Reliability Engineer
6 days ago
Palantir Technologies Washington, United StatesSite Reliability Engineer - Security Infrastructure · Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chai ...
-
Senior Site Reliability Engineer
1 week ago
Red Frog Solutions Washington, United StatesSite Reliability Engineer - SRE - (TS/SCI) · Full Time Perm · Washington D.C. (Hybrid - · 3 days onsite, 2 days remote) · $180K - $200K Salary Plus Competitive Benefits · As a Site Reliability Engineer (SRE), you will play a vital role in continuously driving improvements in ...
-
Senior Reliability Marine Engineer
1 week ago
BuildSubmarines Washington, United StatesSenior Reliability Marine Engineer · Position Summary: · Life Cycle Engineering is looking for a Senior Reliability Marine Engineer to support the PMS 300/325 Ship Design Management team in Washington, DC. As a Senior Reliability Marine Engineer, you will play a critical role i ...
-
Lead Site Reliability Engineer
1 week ago
Mount Indie Washington, United StatesMount Indie is on the search for a Lead Site Reliability Engineering (SRE) to work remotely, focusing on delivering mission critical services that empower end users. The role will involve designing and implementing end to end CI/CD pipelines using AI/ML tooling. · Responsibiliti ...
-
SME Reliability/Sustainability Engineer
1 week ago
ManTech International Corporation Washington, United States Full timeSecure our Nation, Ignite your Future · Become an integral part of a diverse team while working at an Industry Leading Organization, where our employees come first. At ManTech International, you'll help protect our national security while working on innovative projects that offe ...
-
Lead Site Reliability Engineer
15 hours ago
Mount Indie Washington, United StatesMount Indie is on the search for a Lead Site Reliability Engineering (SRE) to work remotely, focusing on delivering mission critical services that empower end users. The role will involve designing and implementing end to end CI/CD pipelines using AI/ML tooling. · Responsibiliti ...
-
Cloud Site Reliability Engineer
1 week ago
Blue Rose Consulting Group Washington, United StatesBlue Rose is seeking a Cloud Site Reliability Engineer (SRE) to support our work with multiple federal clients. This is a Hybrid role supporting our government clients across the Washington, DC area and is open to U.S. Citizens only. · Successful candidates are required to have ...
-
Cloud Site Reliability Engineer
3 days ago
Blue Rose Consulting Group Washington, United StatesBlue Rose is seeking a Cloud Site Reliability Engineer (SRE) to support our work with multiple federal clients. This is a Hybrid role supporting our government clients across the Washington, DC area and is open to U.S. Citizens only. · Successful candidates are required to have ...
-
Reliability Engineer, Electrical Systems, NA
1 week ago
Vantage Data Centers Washington, United StatesReliability Engineer, Electrical Systems, NA page is loaded · Reliability Engineer, Electrical Systems, NA · Apply · locations · Santa Clara, California · Quincy, Washington · Ashburn, Virginia · Phoenix, Arizona · time type · Full time · posted on · Posted 2 Days Ago ...
-
Senior Site Reliability Engineering Manager
1 week ago
Microsoft Corporation Washington, United StatesDo you have a passion for high scale services and working with some of Microsoft's most critical customers? We're looking for a Senior Site Reliability Engineering Manager with the right mix of software development, on-line services experience and passion for quality to envision, ...
-
Senior Site Reliability Engineer I
2 weeks ago
Articulate Washington, United StatesArticulate is looking for a Senior Site Reliability Engineer to join our amazing Platform Engineering team. The Senior Site Reliability Engineer I will be responsible for working cross-functionally to deliver and maintain scalable and reliable infrastructure. · To be considered ...
-
Lead Azure Site Reliability Engineer
1 week ago
Mechanicode Washington, United StatesWe are looking for a Lead Azure Site Reliability Engineer (SRE) to enable efficient monitoring and observability of the CDC Azure infrastructure and and applications. · The SRE will lead operations of the cloud environment with observability, IAC, and cloud-native best practices ...
-
4269 - Site Reliability Engineer (TS/SCI)
6 days ago
Mission Box Solutions Washington, United StatesAs a Site Reliability Engineer (SRE), you will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the federal government. · Our client firmly believes that exceptional technology servi ...
-
Senior Pharmacy Technician
1 day ago
Shields Health Solutions Washington, United StatesAt Shields, we pride ourselves on our culture of inclusion and kindness and the difference we make in patients' lives every day. We are tremendously proud to be certified as a 'Great Place to Work' four years in a row. In 2022, we were awarded US Fortune Best Workplaces in Health ...
-
Artificial Intelligence Engineer
2 days ago
Mindlance Washington, United StatesPosition Summary: · Title: Architect Premium III – AI Engineer · Duration: 3 Months - Long Term · Location: Washington, DC 20433 · Hybrid Onsite: 4 Days per week from Day1. · HM Comments: We are seeking an architect who is an expert in AI and Machine Learning areas. · Roles a ...
Senior Site Reliability Engineer - Washington, United States - Sparibis
Description
Location: 100% remoteYears' Experience: 10+ Year's of experience
Education: Bachelor's degree
Work Authorization: United States Citizenship is required as part of the eligibility criteria to be able to obtain a security clearance.
Clearance: Applicants must be able to obtain and maintain a Public Trust security clearance.
Key Skills:
Sparibis LLC is a professional solution firm that Clients rely on to access the best talent to drive their business success.
Sparibis is an equal opportunity employer that values diversity at all levels. All individuals, regardless of personal characteristics, are encouraged to apply.