- Demonstrate expertise in SRE principles, with a strong understanding of maintaining enterprise services quality and stability in a fast-paced development environment.
- Have demonstrable experience in automating deployment processes, configuration management, and continuous
- Possess strong system administration skills, capable of monitoring performance, troubleshooting issues, and implementing necessary fixes.
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
- Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement
- have a minimum of 7 years of cloud and on-prem data center operations experience which includes
- produce high quality work independently.
- excel in a fast-paced environment.
- possess strong problem solving skills.
- demonstrate effective communication and collaboration, especially during crises.
- provide clear explanations of technical concepts verbally and in documentation.
- have experience working with the Center for Medicare and Medicaid services.
- Apply SRE principles to maintain the reliability, availability, and performance of
- Implement observability practices to detect abnormal behaviors in the software and collect information for effective problem resolution.
- Respond quickly and effectively to critical incidents, conducting post-incident reviews to identify root causes and implement preventive measures.
- Act as a versatile problem solver, filling gaps in team knowledge and expertise to ensure smooth and efficient software operations.
- Collaborate effectively with cross-functional teams and communicate technical concepts and recommendations clearly to both technical and non-technical stakeholders.
- Facilitate smooth team and project transitions, providing guidance, training, and support for development teams to manage their infrastructure independently.
- Identify performance bottlenecks and implement optimizations to enhance system response times and resource utilization.
- Establish Service-Level Objectives (SLOs) and measure Service-Level Indicators (SLIs) to assess the quality-of-service delivery and reliability.
- Plan, participate and manage on-call rotations to ensure prompt response to reported software issues.
- Develop a reliability rating system to assess team and project performance, collecting and analyzing metrics to evaluate adherence to best practices.
- Develop and maintain automation tools and scripts to improve operational efficiency.
- Automate deployment processes, configuration management, and CI/CD pipelines to streamline software development and delivery.
- Perform system administration tasks, monitor system performance, troubleshoot issues, and apply necessary fixes.
- Stay up to date with the latest industry trends, technologies, and best practices related to
- Implement a reliability-based change management process, allowing teams with higher reliability scores to perform quick and frequent releases.
- Proactively identify potential issues and implement preventive measures to reduce incidents and outages.
- Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
- Utilize incident response tools to categorize the severity of reported cases and handle them promptly.
- Implement configuration management tools to automate software workflows and enhance team productivity.
- Provide opportunities for professional growth and skill development in SRE, DevOps, and infrastructure management.
- Support your efforts in automating processes and enhancing operational efficiency.
- Foster a collaborative and communicative environment to work effectively with cross-functional teams.
- Encourage continuous learning to stay updated with industry trends and best practices.
- Setting up and maintaining a reliable and scalable cloud and on-prem infrastructure.
- Designing incident response procedures and post-incident review processes.
- Analyzing system performance metrics and optimizing resources for better efficiency.
- Establishing observability practices to detect and resolve software issues proactively.
- Defining SLOs and SLIs to assess service quality and reliability across projects.
- Developing automation tools to streamline repetitive tasks and improve team productivity.
- Planning and managing on-call rotations to ensure timely issue resolution.
- Implementing automated CI/CD pipelines for smooth software deployment.
- Configuring and maintaining software workflows using configuration management tools.
- Deployments
- Cloud Infrastructure
- Monitoring & Alerting
- OS
-
Reliability Engineer
1 week ago
W. R. Grace Baltimore, United StatesRequisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...
-
Reliability Engineer
1 week ago
Peacefestival Baltimore, United StatesOpportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Opportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Reli ...
-
Corporate Reliability Engineer
1 week ago
The Sotland Group Baltimore, United StatesThe role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
-
Corporate Reliability Engineer
6 days ago
The Sotland Group Inc Baltimore, United StatesThe role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
-
Reliability Engineer
1 week ago
W. R. Grace Curtis Bay, United StatesRequisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...
-
Site Reliability Engineer
1 week ago
SITEC Consulting LLC Baltimore, United StatesAbout SITEC · SITEC is an employee and customer focused Information Technology and Professional Services Firm specializing in design, development, and delivery of state-of-the-art technology solutions, as well as cybersecurity, software and systems engineering services. · Summa ...
-
Reliability & Maintainability Engineer
1 week ago
Amentum Baltimore, United StatesAmentum is hiring a · Reliability & Maintainability Engineer · to support an Army program office at · Aberdeen Proving Ground, MD . · ** This position is eligible for a hybrid telework schedule and requires a minimum of 2 days of onsite work a week. This is not a remote positi ...
-
Corporate Reliability Engineer
5 days ago
The Sotland Group Inc Baltimore, United StatesThe role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
-
Site Reliability Engineer
1 day ago
CTI Baltimore, United StatesAbout us: Mission Driven, Employee Focused · At CTI, you'll be at the center of an award-winning corporate culture, breaking technological barriers and solving real-world problems for our federal government customers. We are committed to hiring the best of the best, and in return ...
-
Site Reliability Engineer
1 week ago
Akina Baltimore, United StatesTS/SCI w/Polygraph required · Approved for 60% telework · 06-11-SRE · Description: · DevOps refers to a software development concept that unites and brings together developers and IT staff. The DevOps approach involves consistent, small edits to software coding. This means fr ...
-
Site Reliability Engineer
1 week ago
CACi Baltimore, United StatesSite Reliability EngineerJob Category: EngineeringTime Type: Full timeMinimum Clearance Required to Start: TS/SCI with PolygraphEmployee Type: RegularPercentage of Travel Required: NoneType of Travel: None* * * · What You'll Get to Do: · The Site Reliability Engineer provides su ...
-
Operational Reliability Engineer
1 week ago
Amches Baltimore, United StatesPosition: Operational Reliability Engineer 2 · Job Description: · Collaborate with product teams to understand the performance-monitoring needs and requirements for approximately 20 applications within the organization. · Work closely with the monitoring team and product teams t ...
-
Site Reliability Engineer
1 week ago
Booz Allen Hamilton Baltimore, United StatesEngineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or · sof tware development—if you have a passion for making systems better, we need yo ...
-
Site Reliability Engineering
1 day ago
Archesys Inc Baltimore, United StatesJob Description · Job DescriptionArchesys is a technology firm specializing in innovative cloud solutions and services for clients across various industries. We pride ourselves on our cutting-edge technologies, exceptional customer service, and collaborative work environment. We ...
-
Corporate Sr. Reliability Engineer
1 hour ago
Florida Crystals / ASR Group Baltimore, United StatesASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...
-
Corporate Sr. Reliability Engineer
1 week ago
Florida Crystals / ASR Group Baltimore, United StatesASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...
-
DevOps/Site Reliability Engineer
1 day ago
Salesforce Baltimore, United StatesInc's Candidate Privacy Notice contains more details about the handling and use of the personal data of job applicants. · For more information about our website privacy practices, please see our Privacy Statement. · DevOps/Site Reliability Engineer (SRE) with TS/SCI (on site Nort ...
-
Site Reliability Engineer II
1 week ago
Fearless Baltimore, United StatesFearless is looking for a Site Reliability Engineer II to add to our diverse team of 250+ employees (and counting). · What You'll Be Doing · We're looking to change the world by building software with a soul, and we want your help. · The Site Reliability Engineer II implements ...
-
Site Reliability Engineer II
2 weeks ago
Fearless Baltimore, United StatesAbout Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...
-
Site Reliability Engineer II
1 week ago
Fearless Baltimore, United StatesAbout Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...
Senior Site Reliability Engineer - Baltimore, United States - VITG Inc
Description
Job DescriptionJob Description
VITG is seeking a skilled Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of our enterprise services hosted across CSPs and on-prem by implementing best practices, automation, and monitoring.
The SRE role is the practical implementation of DevOps principles, where speed and stability are carefully balanced, and the team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.
Collaboration with cross-functional teams, clear communication of technical concepts, and a proactive approach to incident prevention are key aspects of this role.
If you're a proactive problem solver with a passion for continuous learning and innovation, join us as we endeavor to increase the dynamism and efficacy of our software operations.
We're looking for you to:
deployment (CI/CD) pipelines.
The ideal candidate will:
Terraform/Cloudformation/Ansible
etc
Experience in writing
shell/python/nodejs
scripts
System administration skills with linux and windows systems
Extensive working knowledge of
AWSservices in enterprise setup
Experience in working with observability tools like
Splunk/Datadog/Newrelic
Proficient in writing technical
documentation/playbooks/runbooks.
You will:
We will:
Projects you could work on:
Our Tech Stack:
CI/CD
Jenkins
Terraform
Cloudformation
Containers
AWS
Azure
AWScloudwatch
AWScloudwatch canaries
AWScloudwatch X-ray
Splunk Enterprise
Splunk ITSI
Splunk on-call
Newrelic
Amazon Linux
Redhat
Citrix VDI
VITG is an Equal
Opportunity/Affirmative
Action employer.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.
Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.
VITG is an EqualOpportunity/Affirmative
Action employer.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.
Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.
VITG is an Equal Opportunity/Affirmative Action employer.All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.
#J-18808-Ljbffr