Jobs
>
Fairfax

    Site Reliability Engineer - Fairfax, VA, United States - steampunk

    steampunk
    steampunk Fairfax, VA, United States

    Found in: Dice One Red US C2 - 3 days ago

    Default job background
    Technology / Internet
    Description
    Overview Design
    Disrupt
    Repeat
    Be an agent of change on a team committed to achieving client-focused, mission-driven excellence
    Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges
    Who We Are Steampunk is the explosive collision of human-centered design and traditional government contracting
    An employee-owned company with a startup mindset and time-tested approaches tailored for the federal government, we're passionate about creating solutions that are impactful, practical, scalable, and most importantly, that meet our clients' ever-changing needs
    At Steampunk, we believe in disrupting the status quo and setting the pace in the ecosystem of government contractors, while repurposing tried-and-true methodologies
    We believe in empowering our people to find creative solutions to intractable problems
    We believe the best environment in which to grow and thrive is outside our comfort zone
    While good design makes for a good product, we believe human-centered design makes for an excellent one
    We also believe effective teams are powered by diverse perspectives, backgrounds, and experiences
    To that end, Steampunk is an equal opportunity employer committed to promoting diversity of race, gender, sexual orientation, religion, ethnicity, national origin, disability status, and protected veteran status, amongst our ranks
    Additionally, we participate in the E-Verify program
    Why Steampunk? Our people are the very core of what we do; their expertise and hunger for new and exciting challenges fuel our relentless pursuit of mission success
    As part of our team of "Punks," you'll test the status quo, explore new boundaries, and set the bar high for how government clients expect to engage with contractors
    Because we value our employees' work/life balance (and believe those who work hard deserve to play hard), we offer a very competitive benefits package, including telework/flex scheduling, health/dental with orthodontics/vision insurance upon hire, paid time off with a sell-back benefit and carryover option, 11 Federal Holidays, 100% paid military leave, 100% 401(k) plan match upon hire, professional development/education reimbursement, all flexible spending accounts, and more Contributions As a Steampunk Site Reliability Engineer (SRE), you will be responsible for working with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams to embrace and embody a shared responsibility for the reliability of an organizations' applications and infrastructure

    As an SRE, your primary responsibility is to combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures.

    There are a wide variety of responsibilities you will be delivering in this role:

    Infrastructure Optimization:
    Conduct in-depth analyses of infrastructure, identifying areas for improvement in terms of performance, scalability, and resource utilization

    Collaborate with development and operations teams to implement enhancements, utilizing software engineering and/or infrastructure-as-code principles to streamline deployment processes and ensure consistency across environments.


    Reliability Metrics and Reporting:
    Define and implement key reliability metrics, service-level objectives (SLOs), and service-level indicators (SLIs) to measure and report on the health of our systems
    Establish monitoring and alerting mechanisms to proactively identify potential issues before they impact users.

    Automation and Tooling:
    Design and implement automation tools to reduce manual toil, streamline repetitive tasks, and enhance overall operational efficiency

    Leverage software development techniques to create robust, scalable tooling that supports our reliability goals, and collaborate with development teams to integrate reliability features into the development lifecycle.


    • Performance Optimization using

    Software Development Techniques:
    Collaborate with software development teams to optimize the performance and resilience of services through code improvements, architectural enhancements, and performance tuning

    Integrate automated testing and profiling into the development pipeline to identify and address performance bottlenecks early in the development lifecycle.


    Capacity Planning and Scaling:
    Collaborate with infrastructure teams to forecast capacity requirements, ensuring our systems can seamlessly scale to meet growing user demands
    Implement strategies for auto-scaling and load balancing to optimize resource utilization and enhance overall system stability.

    Collaboration and Training:
    Work closely with development teams to embed reliability best practices into the software development process

    Provide mentorship and training to cross-functional teams on SRE principles, encouraging a shared responsibility for the reliability of our services.


    Incident Management:
    Lead the development and implementation of incident response procedures, ensuring timely and effective resolution of issues to minimize impact on users
    Foster a culture of continuous improvement by conducting thorough post-incident reviews, identifying root causes, and implementing preventative measures.

    Infrastructure and Systems Monitoring:
    Observe and monitor systems to make sure you have the insight into system performance, health, availability and what is happening internally in the system
    Understand what to monitor based on the system(s) you are managing, where to store the monitoring data, who can access historical monitoring data, and how to look at the data to make determinations about future actions

    Qualifications Required:
    Bachelor's degree and at least 5 years of IT experience and 2 years of SRE experience Eligible to obtain and maintain and government security clearance Knowledge and experience with Agile and DevSecOps methodologies Experience in System Engineering in one or more areas including telecommunications concepts, computer languages, operating systems, database/Data Base Management System (DBMS) and middleware Experience with the following software/tools:


    • Source code and binary repository products and techniques (GitHub, GitLab, BitBucket, Artifactory, Nexus, etc.)
    • Infrastructure and Cloud Management tools such as AWS CloudWatch
    • Log Management and Analysis tools such as Splunk
    • Automation and Configuration Management tools such as Terraform or

    Puppet Preferred:
    Knowledge and experience with NewRelic and/or other AIOps platforms Have programming skills - Javascript, Ruby and/or Go Experience with Nginx, HAProxy, Docker, Kubernetes or similar technologies Experience with messaging systems, collaboration software, application-based firewall and proxy server(s), and operating systems Experience with Linux and Windows operating systems, along with scripting tools and techniques such as Bash, CSH, KSH, ZSH, etc
    and/or Powershell
    Experience with Monitoring and Alerting tools such as Prometheus, Grafana and Datadog About steampunk Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors
    Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges
    As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers - and rewarding them for outstanding contributions to our growth
    If you want to learn more about our story, visit
    We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law
    Steampunk participates in the E-Verify program.

  • steampunk

    Sr. Site Reliability Engineer

    Found in: Dice One Red US C2 - 3 days ago


    steampunk Fairfax, VA, United States

    Overview Design · Disrupt · Repeat · Be an agent of change on a team committed to achieving client-focused, mission-driven excellence · Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges · Who We Are Steampunk is the ex ...

  • Hamdan Resources

    Site Reliability Engineer

    Found in: ZipRecruiter Test10P US C2 - 3 days ago


    Hamdan Resources Centreville, United States

    Job Description · Job DescriptionJob Description · Our client is seeking a Site Reliability Engineer to support multiple cross-functional project teams that are developing and sustaining large, mission critical, user-facing applications and services. This is an onsite role in Cha ...

  • Enterprize Software LLC

    Site Reliability Engineer

    Found in: Lensa US 4 C2 - 5 days ago


    Enterprize Software LLC Chantilly, United States

    We are seeking a dedicated Site Reliability Engineer for our Operations & Sustainment Team. Our ideal candidate would be someone with a comprehensive knowledge of IT environments, software applications, and technical frameworks. They should have a proven record of providing Tier ...

  • Hamdan Resources

    Site Reliability Engineer

    Found in: Lensa US 4 C2 - 6 days ago


    Hamdan Resources Chantilly, United States

    Job Description · Our client is seeking a Site Reliability Engineer to support multiple cross-functional project teams that are developing and sustaining large, mission critical, user-facing applications and services. This is an onsite role in Chantilly, VA. · ResponsibilitiesM ...

  • AES Corporation

    Reliability Engineer

    Found in: beBee S2 US - 1 day ago


    AES Corporation Arlington, United States Full time

    AES's mission is to improve lives by accelerating a safer and greener energy future. We are a global, agile, cohesive organization with an employee engagement level akin to a startup company. AES businesses throughout the world are often recognized as great places to work. Our pe ...

  • Saint-Gobain

    Reliability Engineer

    Found in: Jooble US O C2 - 5 days ago


    Saint-Gobain Washington DC, United States

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of sa ...

  • System One Holdings, LLC

    Site Reliability Engineering

    Found in: Dice One Red US C2 - 21 hours ago


    System One Holdings, LLC Reston, United States

    Site Reliability Engineering (SRE) Lead 100% Remote W2 ONLY US Citizenship required per government contract Must be able to obtain a DHS Public Trust clearance As a Site Reliability Engineering (SRE) Lead, you'll deliver mission-critical services that empower end users. As the id ...

  • ALTA IT Services

    Site Reliability Engineering

    Found in: J-Vers US C2 - 9 hours ago


    ALTA IT Services Reston, United States

    Site Reliability Engineering (SRE) Lead · 100% Remote · W2 ONLY · US Citizenship required per government contract · Must be able to obtain a DHS Public Trust clearance · As a Site Reliability Engineering (SRE) Lead, you'll deliver mission-critical services that empower end users ...

  • Oracle

    Site Reliability Engineer

    Found in: Appcast US C2 - 1 day ago


    Oracle Reston, United States

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible fo ...

  • Virtru Corporation

    Site Reliability Engineer

    Found in: ZipRecruiter Test10P US C2 - 3 days ago


    Virtru Corporation Reston, United States

    Job Description · Job DescriptionDescription:Position:Site Reliability Engineer with Security Clearance · Location: Reston, Virginia – hybrid role - minimum of 3 days required onsite · About Virtru: · Virtru, a prominent data protection provider, is committed to redefining data ...

  • TEEMA

    Site Reliability Engineer

    Found in: ZipRecruiter Test10P US C2 - 3 days ago


    TEEMA Fairfax, United States

    Job Description · Job DescriptionTitle: Site Reliability EngineerJob ID: JK · The Site Reliability Engineer will provide support to the Securities and Exchange Commission (SEC) to regulate the U.S. securities markets, including the security-based swap (SBS) markets. Tasks will s ...

  • Intelligent Waves

    Site Reliability Engineer

    Found in: Lensa US 4 C2 - 16 hours ago


    Intelligent Waves Reston, United States

    Overview: · Intelligent Waves has grown by more than 65% in the past two years and has an outstanding new opportunity for a Site Reliability Engineer-DevOps Cloud professional on our team in Reston, VA. This contract was awarded in 2022 with three more option years ahead. Work w ...

  • Oracle

    Site Reliability Engineer

    Found in: Lensa US 4 C2 - 5 days ago


    Oracle Reston, United States

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible fo ...

  • Inadev Corporation

    Site Reliability Engineer

    Found in: Careerbuilder One Red US C2 - 3 days ago


    Inadev Corporation McLean, VA, United States

    Formed in 2011, Inadev is focused on its founding principle to build innovative customer-centric solutions incredibly fast, secure, and at scale · We deliver world-class digital experiences to some of the largest federal agencies and commercial companies · Our technical expertise ...

  • The Squires Group

    Site Reliability Engineer

    Found in: Talent US C2 - 2 days ago


    The Squires Group Reston, United States

    Our client has an immediate need for a CLEARED Site Reliability Engineer in their Reston, VA location. In this role, you will be responsible for envisioning, designing, coding, validating, and deploying NEW features from the ground up · This direct hire role offers competitive p ...

  • BAE Systems

    Systems Reliability Engineer

    Found in: Dice One Red US C2 - 3 days ago


    BAE Systems Herndon, VA, United States

    Job Description Job Description · BAE Systems, a top-ten prime contractor to the U.S · Department of Defense, enables the U.S · government to transform data into intelligence and provides engineering, integration and sustainment support for critical military platforms and systems ...

  • iNovex Information Systems

    Systems Reliability Engineer

    Found in: One Red Cent US C2 - 3 days ago


    iNovex Information Systems Herndon, United States

    · Job Brief · . · Job Description · HTS (iNovex) was built on the principle that people matter first and foremost.We believe in providing a strong work/life balance by investing in our employees and encouraging professional and personal growth.We do this by offering exceptional ...

  • BAE Systems

    Systems Reliability Engineer

    Found in: One Red Cent US C2 - 2 weeks ago


    BAE Systems Herndon, United States Full time

    Job Description Job Description · BAE Systems, a top-ten prime contractor to the U.S. Department of Defense, enables the U.S. government to transform data into intelligence and provides engineering, integration and sustainment support for critical military platforms and systems. ...


  • Amazon Herndon, United States

    As an Infrastructure Reliability Engineer you will be proactively driving the reliability risk identification, assessment and mitigation for datacenter infrastructure equipment (Example: Air Handling Units, LV Generator, MV Transformers, LV SWGR, Breakers, UPS, Chillers etc.). Yo ...

  • Disability Solutions

    Systems Reliability Engineer

    Found in: One Red Cent US C2 - 3 days ago


    Disability Solutions Herndon, United States

    Job Description · Job Description · BAE Systems, a top-ten prime contractor to the U.S. Department of Defense, enables the U.S. government to transform data into intelligence and provides engineering, integration and sustainment support for critical military platforms and system ...