Jobs
>
Baltimore

    Senior Site Reliability Engineer - Baltimore, United States - VITG Inc

    VITG Inc
    VITG Inc Baltimore, United States

    1 week ago

    Default job background
    Description
    Job DescriptionJob Description


    VITG is seeking a skilled Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of our enterprise services hosted across CSPs and on-prem by implementing best practices, automation, and monitoring.

    As an SRE, you'll troubleshoot and resolve technical issues, optimize performance, and establish reliability-based release management processes.


    The SRE role is the practical implementation of DevOps principles, where speed and stability are carefully balanced, and the team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.


    Collaboration with cross-functional teams, clear communication of technical concepts, and a proactive approach to incident prevention are key aspects of this role.

    If you're a proactive problem solver with a passion for continuous learning and innovation, join us as we endeavor to increase the dynamism and efficacy of our software operations.


    We're looking for you to:

    • Demonstrate expertise in SRE principles, with a strong understanding of maintaining enterprise services quality and stability in a fast-paced development environment.
    • Have demonstrable experience in automating deployment processes, configuration management, and continuous
    integration/continuous

    deployment (CI/CD) pipelines.


    • Possess strong system administration skills, capable of monitoring performance, troubleshooting issues, and implementing necessary fixes.
    • Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
    • Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement

    The ideal candidate will:

    • have a minimum of 7 years of cloud and on-prem data center operations experience which includes
    Developing IAC in cloud and on-prem workflows using

    Terraform/Cloudformation/Ansible

    etc

    Experience in writing

    shell/python/nodejs

    scripts

    System administration skills with linux and windows systems

    Extensive working knowledge of
    AWSservices in enterprise setup

    Experience in working with observability tools like

    Splunk/Datadog/Newrelic

    Proficient in writing technical

    documentation/playbooks/runbooks.


    • produce high quality work independently.
    • excel in a fast-paced environment.
    • possess strong problem solving skills.
    • demonstrate effective communication and collaboration, especially during crises.
    • provide clear explanations of technical concepts verbally and in documentation.
    • have experience working with the Center for Medicare and Medicaid services.

    You will:

    • Apply SRE principles to maintain the reliability, availability, and performance of
    Enterprise services.


    • Implement observability practices to detect abnormal behaviors in the software and collect information for effective problem resolution.
    • Respond quickly and effectively to critical incidents, conducting post-incident reviews to identify root causes and implement preventive measures.
    • Act as a versatile problem solver, filling gaps in team knowledge and expertise to ensure smooth and efficient software operations.
    • Collaborate effectively with cross-functional teams and communicate technical concepts and recommendations clearly to both technical and non-technical stakeholders.
    • Facilitate smooth team and project transitions, providing guidance, training, and support for development teams to manage their infrastructure independently.
    • Identify performance bottlenecks and implement optimizations to enhance system response times and resource utilization.
    • Establish Service-Level Objectives (SLOs) and measure Service-Level Indicators (SLIs) to assess the quality-of-service delivery and reliability.
    • Plan, participate and manage on-call rotations to ensure prompt response to reported software issues.
    • Develop a reliability rating system to assess team and project performance, collecting and analyzing metrics to evaluate adherence to best practices.
    • Develop and maintain automation tools and scripts to improve operational efficiency.
    • Automate deployment processes, configuration management, and CI/CD pipelines to streamline software development and delivery.
    • Perform system administration tasks, monitor system performance, troubleshoot issues, and apply necessary fixes.
    • Stay up to date with the latest industry trends, technologies, and best practices related to
    SRE, DevOps, and infrastructure management.

    • Implement a reliability-based change management process, allowing teams with higher reliability scores to perform quick and frequent releases.
    • Proactively identify potential issues and implement preventive measures to reduce incidents and outages.
    • Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
    • Utilize incident response tools to categorize the severity of reported cases and handle them promptly.
    • Implement configuration management tools to automate software workflows and enhance team productivity.

    We will:

    • Provide opportunities for professional growth and skill development in SRE, DevOps, and infrastructure management.
    • Support your efforts in automating processes and enhancing operational efficiency.
    • Foster a collaborative and communicative environment to work effectively with cross-functional teams.
    • Encourage continuous learning to stay updated with industry trends and best practices.

    Projects you could work on:

    • Setting up and maintaining a reliable and scalable cloud and on-prem infrastructure.
    • Designing incident response procedures and post-incident review processes.
    • Analyzing system performance metrics and optimizing resources for better efficiency.
    • Establishing observability practices to detect and resolve software issues proactively.
    • Defining SLOs and SLIs to assess service quality and reliability across projects.
    • Developing automation tools to streamline repetitive tasks and improve team productivity.
    • Planning and managing on-call rotations to ensure timely issue resolution.
    • Implementing automated CI/CD pipelines for smooth software deployment.
    • Configuring and maintaining software workflows using configuration management tools.

    Our Tech Stack:

    • Deployments
    Github

    CI/CD

    Jenkins

    Terraform

    Cloudformation

    Containers


    • Cloud Infrastructure

    AWS
    Azure


    • Monitoring & Alerting
    Datadog


    AWScloudwatch


    AWScloudwatch canaries


    AWScloudwatch X-ray

    Splunk Enterprise

    Splunk ITSI

    Splunk on-call

    Newrelic


    • OS
    Windows Servers

    Amazon Linux

    Redhat

    Citrix VDI

    VITG is an Equal

    Opportunity/Affirmative

    Action employer.

    All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.

    Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.

    VITG is an Equal

    Opportunity/Affirmative

    Action employer.

    All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.

    Company DescriptionVITG is a GovTech firm specializing in Cloud Engineering and Managed Services helping our Federal Clients make our citizens' lives easier, smarter, and future-ready.

    VITG is an Equal Opportunity/Affirmative Action employer.

    All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.

    #J-18808-Ljbffr


  • W. R. Grace Baltimore, United States

    Requisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...


  • Peacefestival Baltimore, United States

    Opportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Opportunities for · Reliability Engineer · in · Baltimore, MD · Remote positions only · Search completed. Found 0 matching records. · Reli ...


  • The Sotland Group Baltimore, United States

    The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...


  • The Sotland Group Inc Baltimore, United States

    The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...


  • W. R. Grace Curtis Bay, United States

    Requisition ID: 22915 · Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technolog ...


  • SITEC Consulting LLC Baltimore, United States

    About SITEC · SITEC is an employee and customer focused Information Technology and Professional Services Firm specializing in design, development, and delivery of state-of-the-art technology solutions, as well as cybersecurity, software and systems engineering services. · Summa ...


  • Amentum Baltimore, United States

    Amentum is hiring a · Reliability & Maintainability Engineer · to support an Army program office at · Aberdeen Proving Ground, MD . · ** This position is eligible for a hybrid telework schedule and requires a minimum of 2 days of onsite work a week. This is not a remote positi ...


  • The Sotland Group Inc Baltimore, United States

    The role of the Corporate Senior Reliability Engineer (Sr. RE) is to identify and mitigate risks that could adversely affect plant or business operations. The Sr. RE provides leadership and technical expertise in support of asset management strategies and associated practices, to ...


  • CTI Baltimore, United States

    About us: Mission Driven, Employee Focused · At CTI, you'll be at the center of an award-winning corporate culture, breaking technological barriers and solving real-world problems for our federal government customers. We are committed to hiring the best of the best, and in return ...


  • Akina Baltimore, United States

    TS/SCI w/Polygraph required · Approved for 60% telework · 06-11-SRE · Description: · DevOps refers to a software development concept that unites and brings together developers and IT staff. The DevOps approach involves consistent, small edits to software coding. This means fr ...


  • CACi Baltimore, United States

    Site Reliability EngineerJob Category: EngineeringTime Type: Full timeMinimum Clearance Required to Start: TS/SCI with PolygraphEmployee Type: RegularPercentage of Travel Required: NoneType of Travel: None* * * · What You'll Get to Do: · The Site Reliability Engineer provides su ...


  • Amches Baltimore, United States

    Position: Operational Reliability Engineer 2 · Job Description: · Collaborate with product teams to understand the performance-monitoring needs and requirements for approximately 20 applications within the organization. · Work closely with the monitoring team and product teams t ...


  • Booz Allen Hamilton Baltimore, United States

    Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or · sof tware development—if you have a passion for making systems better, we need yo ...


  • Archesys Inc Baltimore, United States

    Job Description · Job DescriptionArchesys is a technology firm specializing in innovative cloud solutions and services for clients across various industries. We pride ourselves on our cutting-edge technologies, exceptional customer service, and collaborative work environment. We ...


  • Florida Crystals / ASR Group Baltimore, United States

    ASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...


  • Florida Crystals / ASR Group Baltimore, United States

    ASR Group is the world's largest refiner and marketer of cane sugar, with an annual production capacity of more than 6 million tons of sugar. The company produces a full line of grocery, industrial, food service and specialty sweetener products. Across North America, ASR Group ow ...


  • Salesforce Baltimore, United States

    Inc's Candidate Privacy Notice contains more details about the handling and use of the personal data of job applicants. · For more information about our website privacy practices, please see our Privacy Statement. · DevOps/Site Reliability Engineer (SRE) with TS/SCI (on site Nort ...


  • Fearless Baltimore, United States

    Fearless is looking for a Site Reliability Engineer II to add to our diverse team of 250+ employees (and counting). · What You'll Be Doing · We're looking to change the world by building software with a soul, and we want your help. · The Site Reliability Engineer II implements ...


  • Fearless Baltimore, United States

    About Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...


  • Fearless Baltimore, United States

    About Fearless Digital · Fearless Digital builds software with a soul. As a division inside Fearless, we're part of its digital services integrator model to unlock the power of organizations, people and tech. Our division designs, engineers, and delivers digital solutions to sol ...