Jobs
>
Reston

    SiteOps Global Product Platform Engineering Manager - Reston, United States - Meta Inc

    Default job background
    Description

    Summary:


    Meta is seeking a forward thinking, experienced Accelerator (including GPU) Product Platform Engineering Manager to join the Data Center Site Operations team.

    The Product Platform Engineering (PPE) team is responsible for the overall performance of Meta's production compute, storage, and accelerator platforms through their life-cycles in our data centers.

    This role will lead the subset of the PPE team that focuses on accelerator platform hardware. Accelerators are an important priority for Meta that involves complex systems operating in shared computing clusters.

    The role scope is focused on maintaining and improving the health of the accelerator platforms from operational testing into mass production through end-of-life.

    Key responsibilities include identifying systemic hardware, firmware, and tooling issues; engaging in hands-on problem solving; and collaborating effectively with cross-functional engineering and tooling teams to improve performance of the fleet.

    Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered.

    Meta is at the leading edge of the global data center industry both in terms of how data centers are designed and operated.

    This person should enjoy working in a fast-paced environment where adaptability and flexibility will be key to their success.

    We seek an individual who can quickly absorb and understand the technical challenges of subject matter experts and local site operations teams, create alignment between these globally distributed teams as well as partner organizations, and can set informed priorities and direction while getting buy-in and commitment from relevant stakeholders.


    Required Skills:

    SiteOps Global Product Platform Engineering Manager Responsibilities:
    Manage other PPE team members through efforts that provide end-to-end lifecycle ownership (operational test through end of life decommissioning) of accelerator (including GPU) hardware platforms and associated new technologies in the data centers

    Serve as the central point of contact representing the accelerator hardware platforms and associated new technologies across SiteOps, and be the subject matter experts on hardware platform issues, for datacenter operations teams

    Drive complex accelerator technical investigations globally and spanning multiple disciplines such as Hardware, Software/Firmware, Networking and Power & Cooling


    Work closely with other PPE team members to share best practices and ensure appropriate feedback is given to cross-functional teams.

    Issue timely alerts and support fixes to operations teams, and assure a robust feedback pipeline to engineering teams

    Provide serviceability feedback on accelerator production hardware to engineering design teams

    Provide technical mentorship on large scale data center projects and initiatives to global, cross-functional teams

    Build strong relationships and collaboration with engineering and cross functional teams across the company. Actively solicit feedback from teams, and use that feedback to improve operational effectiveness as infrastructure scales

    Own the cross-functional communication with other technical operations groups to help resolve incidents

    Collaborate with stakeholders, functional owners and subject matter experts to interpret and articulate business and operations needs

    Ability to travel up to 30% required


    Minimum Qualifications:

    Minimum Qualifications:
    BS or BA in technical field (electrical, computer science, or mechanical engineering) or commensurate experience


    10+ years experience in NPI (New Product Introduction) hardware development and/or validation, working with cross functional teams to deliver products to production.

    Experience working across a diverse global organization and building partnerships with cross functional teams inside and outside of the organization

    Experience triaging and debugging hardware platforms

    Experience in processing and analyzing large sets of data

    Proven knowledge of server and storage platforms, principles, technologies, protocols, and standards

    Experience with GPU and accelerator based platform hardware that operates in computing clusters.

    Experience managing multiple concurrent projects and managing tight timelines

    Experience working independently within a multi-disciplinary team of hardware and operations engineers

    Experience working with Linux or Unix Operating systems

    Proven technical drafting skills, experience to create documentation for users of all levels

    Experience mentoring others and leading technical teams


    Preferred Qualifications:

    Preferred Qualifications:
    BS or BA in technical field (electrical, computer science, or mechanical engineering)

    Direct experience managing others

    Large-scale data center environment experience, including hardware deployments, deep system knowledge of Linux, Server Hardware, networking, network protocols, supply chain and Data Center automation

    Bash, PHP, Python, or Perl scripting experience

    Experience in data center system and process automation

    Leadership presence and presentation skills


    Public Compensation:
    $163,000/year to $225,000/year + bonus + equity + benefits

    Industry:
    Internet

    Equal Opportunity:
    Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer.

    We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

    We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
    Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-

    #J-18808-Ljbffr


  • CareFirst BlueCross BlueShield Reston, United States Full time

    PURPOSE: · Manages and oversees the teams responsible for design and development of software applications, and their operations and maintenance Directs the work of engineers to ensure the best practices around software development. Provides oversight of enterprise software solut ...


  • QinetiQ Reston, United States

    Company Overview · We are a world-class team of professionals who deliver next generation technology and products in robotic and autonomous platforms, ground, soldier, and maritime systems in 50+ locations world-wide. Much of our work contributes to innovative research in the fie ...


  • QinetiQ Reston, United States

    Company Overview · We are a world-class team of professionals who deliver next generation technology and products in robotic and autonomous platforms, ground, soldier, and maritime systems in 50+ locations world-wide. Much of our work contributes to innovative research in the fie ...


  • Fannie Mae Reston, United States Full time

    Job Description · As a valued leader on our team, you will manage the operational activities of a team who are designing and developing information technology (IT) infrastructure environments, including coding, testing, and certifying technology platforms, software, and applicat ...


  • CACI International Reston, United States

    Data Management/Systems Engineer · Job Category: Engineering and Technical Support · Time Type: Full time · Minimum Clearance Required to Start: TS/SCI with Polygraph · Employee Type: Regular · Percentage of Travel Required: None · Type of Travel: None · * * · CACI is see ...


  • QinetiQ US Reston, United States

    Company OverviewWe are a world-class team of professionals who deliver next generation technology and products in robotic and autonomous platforms, ground, soldier, and maritime systems in 50+ locations world-wide. Much of our work contributes to innovative research in the fields ...


  • RCG Moody International Limited c/- Intertek Fairfax, United States

    Geotechnical Engineering Project Manager · Professional Service Industries, Inc. (Intertek-PSI) is looking for a Geotechnical Engineering Project Manager to join our Building & Construction team in Fairfax, VA. · Intertek-PSI is a leading US based provider of assurance, testing, ...


  • Comcast Reston, United States

    Make your mark at Comcast -- a Fortune 30 global media and technology company. From the connectivity and platforms we provide, to the content and experiences we create, we reach hundreds of millions of customers, viewers, and guests worldwide. Become part of our award-winning tec ...


  • ManTech Herndon, United States Paid Work

    Secure our Nation, Ignite your Future · Engineering Deputy Service Delivery Manager: · Become an integral part of a diverse team while working at an Industry Leading Organization, where our employees come first. At ManTech International Corporation, you'll help protect our nation ...


  • Johnson, Mirmiran, and Thompson Inc. Herndon, United States

    Johnson, Mirmiran & Thompson is a dynamic, 100% employee-owned consulting firm of more than 2,300 professionals that provides a full range of multi-disciplined engineering, architecture, information technology, and related services to public agencies and private clients throughou ...


  • QinetiQ Reston, United States

    Company Overview · We are a world-class team of professionals who deliver next generation technology and products in robotic and autonomous platforms, ground, soldier, and maritime systems in 50+ locations world-wide. Much of our work contributes to innovative research in the fi ...


  • Fannie Mae Reston, United States

    Job Description · As a valued leader on our team, you will manage the operational activities of a team who are designing and developing information technology (IT) infrastructure environments, including coding, testing, and certifying technology platforms, software, and applicat ...


  • Johnson, Mirmiran, and Thompson Inc. Herndon, United States

    Johnson, Mirmiran & Thompson is a dynamic, 100% employee-owned consulting firm of more than 2,000 professionals that provides a full range of multi-disciplined engineering, architecture, information technology, and related services to public agencies and private clients throughou ...


  • Amazon Data Services, Inc. Herndon, United States Full time

    AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment ...


  • General Atomics and Affiliated Companies Herndon, United States

    General Atomics (GA), and its affiliated companies, is one of the world's leading resources for high-technology systems development ranging from the nuclear fuel cycle to remotely piloted aircraft, airborne sensors, and advanced electric, electronic, wireless and laser technologi ...


  • ATCS PLC Herndon, United States

    Job Description · Job DescriptionATCS is seeking an energetic, highly motivated, detail-oriented, self-starter to join our team as a Water Resources Engineer. This role can report out of any ATCS office location, although candidates local to Baltimore, MD, Richmond, VA, Newport N ...


  • ATCS, P.L.C. Herndon, United States

    ATCS is seeking an energetic, highly motivated, detail-oriented, self-starter to join our team as a Water Resources Engineer. This role can report out of any ATCS office location, although candidates local to Baltimore, MD, Richmond, VA, Newport News, VA, or Herndon, VA are prefe ...


  • Amazon Data Services, Inc. Herndon, United States Full time

    AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment ...


  • Palo Alto Networks Reston, United States

    Company Description · Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a co ...


  • Palo Alto Networks Reston, United States

    Job Description · Job DescriptionCompany Description · Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure ...