Senior Site Reliability Engineer - Baltimore, United States - VITG Inc

VITG Inc Baltimore, United States

1 week ago

Description

Job DescriptionJob Description

VITG is seeking a skilled Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of our enterprise services hosted across CSPs and on-prem by implementing best practices, automation, and monitoring.

As an SRE, you'll troubleshoot and resolve technical issues, optimize performance, and establish reliability-based release management processes.

The SRE role is the practical implementation of DevOps principles, where speed and stability are carefully balanced, and the team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.

Collaboration with cross-functional teams, clear communication of technical concepts, and a proactive approach to incident prevention are key aspects of this role.

If you're a proactive problem solver with a passion for continuous learning and innovation, join us as we endeavor to increase the dynamism and efficacy of our software operations.

We're looking for you to:

Demonstrate expertise in SRE principles, with a strong understanding of maintaining enterprise services quality and stability in a fast-paced development environment.
Have demonstrable experience in automating deployment processes, configuration management, and continuous

integration/continuous

deployment (CI/CD) pipelines.

Possess strong system administration skills, capable of monitoring performance, troubleshooting issues, and implementing necessary fixes.
Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement

The ideal candidate will:

have a minimum of 7 years of cloud and on-prem data center operations experience which includes

Developing IAC in cloud and on-prem workflows using

Terraform/Cloudformation/Ansible

etc

Experience in writing

shell/python/nodejs

scripts

System administration skills with linux and windows systems

Extensive working knowledge of
AWSservices in enterprise setup

Experience in working with observability tools like

Splunk/Datadog/Newrelic

Proficient in writing technical

documentation/playbooks/runbooks.

produce high quality work independently.
excel in a fast-paced environment.
possess strong problem solving skills.
demonstrate effective communication and collaboration, especially during crises.
provide clear explanations of technical concepts verbally and in documentation.
have experience working with the Center for Medicare and Medicaid services.

You will:

Apply SRE principles to maintain the reliability, availability, and performance of

Enterprise services.

Implement observability practices to detect abnormal behaviors in the software and collect information for effective problem resolution.
Respond quickly and effectively to critical incidents, conducting post-incident reviews to identify root causes and implement preventive measures.
Act as a versatile problem solver, filling gaps in team knowledge and expertise to ensure smooth and efficient software operations.
Collaborate effectively with cross-functional teams and communicate technical concepts and recommendations clearly to both technical and non-technical stakeholders.
Facilitate smooth team and project transitions, providing guidance, training, and support for development teams to manage their infrastructure independently.
Identify performance bottlenecks and implement optimizations to enhance system response times and resource utilization.
Establish Service-Level Objectives (SLOs) and measure Service-Level Indicators (SLIs) to assess the quality-of-service delivery and reliability.
Plan, participate and manage on-call rotations to ensure prompt response to reported software issues.
Develop a reliability rating system to assess team and project performance, collecting and analyzing metrics to evaluate adherence to best practices.
Develop and maintain automation tools and scripts to improve operational efficiency.
Automate deployment processes, configuration management, and CI/CD pipelines to streamline software development and delivery.
Perform system administration tasks, monitor system performance, troubleshoot issues, and apply necessary fixes.
Stay up to date with the latest industry trends, technologies, and best practices related to

SRE, DevOps, and infrastructure management.

Implement a reliability-based change management process, allowing teams with higher reliability scores to perform quick and frequent releases.
Proactively identify potential issues and implement preventive measures to reduce incidents and outages.
Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
Utilize incident response tools to categorize the severity of reported cases and handle them promptly.
Implement configuration management tools to automate software workflows and enhance team productivity.

We will:

Provide opportunities for professional growth and skill development in SRE, DevOps, and infrastructure management.
Support your efforts in automating processes and enhancing operational efficiency.
Foster a collaborative and communicative environment to work effectively with cross-functional teams.
Encourage continuous learning to stay updated with industry trends and best practices.

Projects you could work on:

Setting up and maintaining a reliable and scalable cloud and on-prem infrastructure.
Designing incident response procedures and post-incident review processes.
Analyzing system performance metrics and optimizing resources for better efficiency.
Establishing observability practices to detect and resolve software issues proactively.
Defining SLOs and SLIs to assess service quality and reliability across projects.
Developing automation tools to streamline repetitive tasks and improve team productivity.
Planning and managing on-call rotations to ensure timely issue resolution.
Implementing automated CI/CD pipelines for smooth software deployment.
Configuring and maintaining software workflows using configuration management tools.

Our Tech Stack:

Deployments

Github

CI/CD

Jenkins

Terraform

Cloudformation

Containers

Cloud Infrastructure

AWS
Azure

Monitoring & Alerting

Datadog

AWScloudwatch

AWScloudwatch canaries

AWScloudwatch X-ray

Splunk Enterprise

Splunk ITSI

Splunk on-call

Newrelic

Windows Servers

Amazon Linux

Redhat

Citrix VDI

VITG is an Equal

Opportunity/Affirmative

Action employer.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.

Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.

VITG is an Equal

Opportunity/Affirmative

Action employer.

Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.

VITG is an Equal Opportunity/Affirmative Action employer.

#J-18808-Ljbffr

Reliability Engineer

1 week ago

W. R. Grace Baltimore, United States

Requisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...
Reliability Engineer

1 week ago

Peacefestival Baltimore, United States

Opportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Opportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Reli ...
Corporate Reliability Engineer

1 week ago

The Sotland Group Baltimore, United States

The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
Corporate Reliability Engineer

6 days ago

The Sotland Group Inc Baltimore, United States

The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
Reliability Engineer

1 week ago

W. R. Grace Curtis Bay, United States

Requisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...
Site Reliability Engineer

1 week ago

SITEC Consulting LLC Baltimore, United States

About SITEC · SITEC is an employee and customer focused Information Technology and Professional Services Firm specializing in design, development, and delivery of state-of-the-art technology solutions, as well as cybersecurity, software and systems engineering services. · Summa ...
Reliability & Maintainability Engineer

1 week ago

Amentum Baltimore, United States

Amentum is hiring a · Reliability & Maintainability Engineer · to support an Army program office at · Aberdeen Proving Ground, MD . · ** This position is eligible for a hybrid telework schedule and requires a minimum of 2 days of onsite work a week. This is not a remote positi ...
Corporate Reliability Engineer

5 days ago

The Sotland Group Inc Baltimore, United States

The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...
Site Reliability Engineer

1 day ago

CTI Baltimore, United States

About us: Mission Driven, Employee Focused · At CTI, you'll be at the center of an award-winning corporate culture, breaking technological barriers and solving real-world problems for our federal government customers. We are committed to hiring the best of the best, and in return ...
Site Reliability Engineer

1 week ago

Akina Baltimore, United States

TS/SCI w/Polygraph required · Approved for 60% telework · 06-11-SRE · Description: · DevOps refers to a software development concept that unites and brings together developers and IT staff. The DevOps approach involves consistent, small edits to software coding. This means fr ...
Site Reliability Engineer

1 week ago

CACi Baltimore, United States

Site Reliability EngineerJob Category: EngineeringTime Type: Full timeMinimum Clearance Required to Start: TS/SCI with PolygraphEmployee Type: RegularPercentage of Travel Required: NoneType of Travel: None* * * · What You'll Get to Do: · The Site Reliability Engineer provides su ...
Operational Reliability Engineer

1 week ago

Amches Baltimore, United States

Position: Operational Reliability Engineer 2 · Job Description: · Collaborate with product teams to understand the performance-monitoring needs and requirements for approximately 20 applications within the organization. · Work closely with the monitoring team and product teams t ...
Site Reliability Engineer

1 week ago

Booz Allen Hamilton Baltimore, United States

Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or · sof tware development—if you have a passion for making systems better, we need yo ...
Site Reliability Engineering

1 day ago

Archesys Inc Baltimore, United States

Job Description · Job DescriptionArchesys is a technology firm specializing in innovative cloud solutions and services for clients across various industries. We pride ourselves on our cutting-edge technologies, exceptional customer service, and collaborative work environment. We ...
Corporate Sr. Reliability Engineer

1 hour ago

Florida Crystals / ASR Group Baltimore, United States

ASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...
Corporate Sr. Reliability Engineer

1 week ago

Florida Crystals / ASR Group Baltimore, United States

ASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...
DevOps/Site Reliability Engineer

1 day ago

Salesforce Baltimore, United States

Inc's Candidate Privacy Notice contains more details about the handling and use of the personal data of job applicants. · For more information about our website privacy practices, please see our Privacy Statement. · DevOps/Site Reliability Engineer (SRE) with TS/SCI (on site Nort ...
Site Reliability Engineer II

1 week ago

Fearless Baltimore, United States

Fearless is looking for a Site Reliability Engineer II to add to our diverse team of 250+ employees (and counting). · What You'll Be Doing · We're looking to change the world by building software with a soul, and we want your help. · The Site Reliability Engineer II implements ...
Site Reliability Engineer II

2 weeks ago

Fearless Baltimore, United States

About Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...
Site Reliability Engineer II

1 week ago

Fearless Baltimore, United States

About Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...

Senior Site Reliability Engineer - Baltimore, United States - VITG Inc

Description

Reliability Engineer

Reliability Engineer

Corporate Reliability Engineer

Corporate Reliability Engineer

Reliability Engineer

Site Reliability Engineer

Reliability & Maintainability Engineer

Corporate Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Operational Reliability Engineer

Site Reliability Engineer

Site Reliability Engineering

Corporate Sr. Reliability Engineer

Corporate Sr. Reliability Engineer

DevOps/Site Reliability Engineer

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Henry Mackay

for Recruiters

Information

Senior Site Reliability Engineer - Baltimore, United States - VITG Inc

Description

Senior Site Reliability Engineer professionals in Baltimore