Jobs
>
Reston

    SRE/Performance Engineering - Reston, VA, United States - Microsoft

    Microsoft background
    Full time
    Description
    Microsoft has an exciting opportunity for a Senior Site Reliability Engineer in the Cloud+AI Silver Team.

    This team will be responsible for deploying and operating a Secure Work Area, including the infrastructure for collaboration within an airgapped environment.

    In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal customers in highly secured and regulated industries.

    The systems and software you build will be required to meet the security policy and assurance requirements of both public and private sector customers.

    Microsoft's mission is to empower every person and every organization on the planet to achieve more.

    As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.

    Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

    Responsibilities
    The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day.

    We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.

    Demonstrates expertise in distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures.

    Can identify and recommend configurations optimal of cloud technology solutions and modify the code base that defines systems or cloud technologies to improve the reliability and operability of supported products with minimal guidance from other engineers.

    Develops an understanding of the code, features, and operations of specific products at scale as required to contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance; participates in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products.

    Researches and maintains an awareness in industry trends, advances in distributed systems and cloud technologies, new tools, and/or processes for maintaining and improving product availability, reliability, efficiency, observability, and/or performance.

    Contributes to the implementation of new solutions within their team by identifying ways they can be applied to solve persistent problems.

    Contributions to Development and Design

    Leverages technical expertise in large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or code to improve the availability, reliability, efficiency, observability, and performance of product components or features supported by their team.

    Develops and tests basic changes to optimize code and improve the observability, reliability and operability of a defined range of platform, system, or product components or features with direction from other engineers.

    Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles; leverages technical expertise on underlying systems/platforms and insights drawn from engagements with product engineering teams and telemetry analyses to propose potential improvements in code base and designs across components and features of one or more products.

    Driving Operational Excellence

    Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale.

    Leverages technical expertise and telemetry analysis across a range of components and/or features to identify patterns and opportunities to implement configuration and data changes for one or more platforms, systems, or products in production using code, tooling, and automation.

    Identifies opportunities to leverage existing tools and automation to enable product engineering teams to increase the velocity in which they can reliably and safely implement changes in production; monitors the effects of changes across multiple components or features within a single platform or system.

    Designs, develops, and maintains telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of product components and features operating at scale.

    Independently performs analyses using existing tools and/or models to identify insights and shares them with product engineering teams to directly contribute to improvements in product development and/or operations; monitors the impact of changes on operations metrics (e.g., Time-to-X).

    Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components and features; proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams.

    Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams and owners to major customer impacting issues and escalates resolution of highly impactful issues affecting multiple components or features to other engineers or engineering teams as needed.

    Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings.

    Develops alerts and instrumentation across components and features to monitor product capacity and resource demands and analyze telemetry data using existing capacity planning models; draws insights from analyses of capacity and resource data to optimize component and feature code to manage resources and capacity across limited range of use conditions and system parameters.

    Utilizes insights from performance and resource monitoring tools to identify whether there is a need to optimize the efficiency of component and feature code, or if changes to compute resources are required; models the predicted effect of changes to code and/or compute resources across components or features to document the efficacy of proposed solutions.

    Shares insights and best practices that can be applied to improve development and operations of system, platform, or product components and features by participating in code/design reviews, incident drills and debriefs, and regular meetings, as well as interactions with more experienced SREs and members of product engineering teams.

    Qualifications

    Required/Minimum Qualifications:

    6+ years technical experience in software engineering, network engineering, or systems administration
    OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ year(s) technical experience in software engineering, network engineering, or systems administration

    OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.


    Other Requirements:

    Security Clearance Requirements:
    Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role.

    These requirements include, but are not limited to the following specialized security screenings:

    The successful candidate must have an active U.S.

    Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph.

    Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.

    Clearance Verification:
    This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.

    Microsoft Cloud Background Check:

    This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.


    Citizenship & Citizenship Verification:
    This position requires verification of U.S. citizenship due to citizenship-based legal restrictions.

    Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law.

    To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance.

    Preferred/Additional Qualifications:

    3+ years of experience with PowerShell, C#, or C++.
    Experience working on large-scale distributed services with on-call responsibilities.
    Ability to build and influence broadly towards common goals and priorities.
    Ownership for end-to-end project lifecycle with solid project management and communication skills.
    Site Reliability Engineering IC- The typical base pay range for this role across the U.S. is USD $112,000 - $218,400 per year.

    There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $145,800 - $238,600 per year.

    Certain roles may be eligible for benefits and other compensation.

    Find additional benefits and pay information here:
    Microsoft is an equal opportunity employer.

    Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.

    If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.



  • Charter Global Reston, United States

    Job Title: Performance Engineer / Mainframe Performance Systems Engineer · Location: Reston, VA (Hybrid Once in a month Need Only From DC,MD,VA OR West VA) · Duration: 6 Months+ CTH · Job Type: W2 · Job ID: 41509 · Required Skills: · 8+ years of IBM z/OS performance engineering/c ...


  • Charter Global Reston, United States

    Job Title: Performance Engineer / Mainframe Performance Systems Engineer · Location: Reston, VA (Remote Job Need only From MD,VA,DC) · Duration: 6 + CTH · Experience: · 8+ years of IBM z/OS performance engineering/capacity planning experience · Knowledge of configuring and troub ...


  • Charter Global Reston, United States

    Job Title: Performance Engineer / Mainframe Performance Systems Engineer · Location: Reston, VA (Remote Job Need only From MD,VA,DC) · Duration: 6 + CTH · Experience: · 8+ years of IBM z/OS performance engineering/capacity planning experience · Knowledge of configuring and troub ...


  • Maxar Technologies Careers Herndon, VA, United States Full time

    Cybersecurity Engineer Role - Join Our Team · We're not your typical software development and systems administration team; we're the powerhouse behind cutting-edge software applications supported by a self-managed high-performance compute (HPC) infrastructure on a private cloud s ...


  • Omni Inclusive Reston, United States

    Job Details · Description · Client is seeking a Performance Test Engineer responsible for supporting the planning, design and execution of system testing on simple to complex implementations. Responsible for defining performance benchmarks, strategy and frameworks and scalabili ...


  • Zolon Tech Inc. Reston, United States

    Job Title: Mainframe Performance Engineer · Location: Reston, VA (primarily remote with onsite visit once or twice in a month) · Duration: 12 Months contract to Hire · Our client is looking for a Mainframe focused Performance Engineer to assist them with application performance a ...


  • RGI, LLC Herndon, United States

    Job Highlights: · As a DevSecOps Engineer with RGi, you will be directly supporting SOCOM customers with quick reaction support, focused on data, tools and technology. You will work as part of an Agile Operational team to translate real-world needs into technical specifications, ...

  • SRC

    Systems Engineer SRE

    2 weeks ago


    SRC Chantilly, United States

    Our client is a software and systems development firm, built by veterans of the IC/DoD with a focus on delivering impactful solutions rapidly. They have a proven track record of success and are renowned for their expertise, attracting top talent and fostering a dynamic culture wh ...


  • Omni Inclusive Reston, United States

    Job Details · Description · Client is seeking a Performance Test Engineer responsible for supporting the planning, design and execution of system testing on simple to complex implementations. Responsible for defining performance benchmarks, strategy and frameworks and scalability ...


  • Saxon Global Reston, United States

    **hybrid schedule in Reston, VA or Bentonville, AR (2 days/week onsite) · **Gatling is nice to have, but not required · **Will be quick 2 step interview process · Title: Performance Engineer · Client: Walmart Labs · Location: hybrid schedule (Reston, VA or Bentonville, AR) · ...


  • GEICO Chevy Chase, United States

    GEICO is seeking an experienced Principal Engineer with a passion for building high-performance, low maintenance, zero-downtime platforms. You will help drive our insurance business transformation as we transition from a traditional IT model to a tech organization with engineerin ...


  • ServiceNow Vienna, United States

    Job Description · Job DescriptionCompany Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joinin ...


  • Amtex Systems Inc. Vienna, United States

    Position: Senior Performance Tester · Duration: 12+ Month · Location: 2 days a week Hybrid in either Vienna, VA, Pensacola, FL, or San Diego, CA. Candidates must live in one of these cities/states. · Top 3 Required Skills (must clearly be visible in resume): · Experienced in scri ...


  • Experis Washington, United States

    Our public sector client is hiring a · Performance Engineer. · This role will develop and execute the performance test scripts for applications. · Candidates must be US Citizens and able to obtain a Public Trust clearance. · Location: · Remote · Pay rate: · $64-$70/hr W2 · Durati ...


  • Swift Manassas, United States

    About The Role · About Us · We're the world's leading provider of secure financial messaging services. We are the way the world moves value – across borders, through cities and overseas. No other organization can address the scale, precision, pace and trust that this demands, a ...

  • GEICO

    Engineer - IaaS SRE

    4 weeks ago


    GEICO Chevy Chase, United States

    Distinguished Engineer - IaaS SRE · Position Summary · GEICO is seeking an experienced Engineer with a passion for building high-performance, low maintenance , zero-downtime platforms, and applications . You will help drive our insurance business transformation as we transiti ...


  • ServiceNow Vienna, United States Permanent

    Job Description · Please Note: · This position will include supporting our US Federal customers. · This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdeme ...


  • Aquent Talent Vienna, United States

    Our client is seeking a Performance Test Engineer to join their innovative Digital Delivery Team. This team is at the forefront of leveraging the latest technologies in open source and the Azure cloud to develop cutting-edge member and team member experiences. Their work includes ...


  • Aquent Vienna, United States

    Our client is seeking a Performance Test Engineer to join their innovative Digital Delivery Team. This team is at the forefront of leveraging the latest technologies in open source and the Azure cloud to develop cutting-edge member and team member experiences. Their work includes ...


  • Amtex Systems Vienna, United States

    Position: Senior Performance Tester · Duration: 12+ Month · Location: 2 days a week Hybrid in either Vienna, VA, Pensacola, FL, or San Diego, CA. Candidates must live in one of these cities/states. · Top 3 Required Skills (must clearly be visible in resume): · Experienced in scr ...