Senior Hardware Systems Quality and Reliability Engineer, Platforms - Sunnyvale, CA, USA
12 hours ago

Job description
info_outlineXThis is a specialized role which requires physical interaction with hardware equipment in a simulated data center environment, utilizing Google labs, power, and safety equipment. Regular development and processing of engineering hardware must be performed on site.Willingness to undertake professional travel for up to 20% of the time.
Minimum qualifications:
- Bachelor's degree in Electrical Engineering, Computer Engineering, Hardware Engineering or equivalent practical experience.
- 6 years of experience across Data Center system hardware domains, such as Semiconductors, Peripheral Component Interconnect Express (PCIe), Power Electronics, Central Processing Unit/xProcessing Unit (CPU/xPU) architectures, Networking, Embedded Systems, and Servers.
- Experience in data curation, mining/analysis, visualization, and scripting utilizing tools such as SQL, JMP, Python, R, Tableau, or similar.
Preferred qualifications:
- PhD or Master's degree in Electrical Engineering, Computer Engineering or Hardware Engineering.
- Experience in Quality and Reliability engineering roles.
- Experience in technical project management and effective communication with executive stakeholders.
- Familiarity with Fault Isolation (FI) and other failure analysis (FA) methodologies.
- Demonstrated success in leading multidisciplinary engineering teams through technical issues with a focus on pragmatic execution.
- Proficiency in statistical methodologies, predictive modeling, and data visualization techniques.
About the job
The team is responsible for identifying and resolving fleet-wide technical issues, implementing product and methodological enhancements to maximize hardware system reliability, and ensuring efficient deployment and maintenance within data center environments. We conduct analysis of fleet data to address systemic issues and implement preventative measures to ensure long-term stability.
This is a specialized role which requires physical interaction with hardware equipment in a simulated data center environment, utilizing Google labs, power, and safety equipment. Regular development and processing of engineering hardware must be performed on site.
The AI and Infrastructure team is redefining what's possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
The US base salary range for this full-time position is $147,000-$216,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
- Collaborate on Data Center Hardware Platforms across a wide range of domains, including Semiconductors, Test, PCIe, Power, CPU, xPU, Power Electronics, and Networking.
- Provide technical leadership by establishing priorities, conducting comprehensive root cause analyses, and resolving technical issues to ensure fleet quality and a stable customer experience.
- Optimize system health and repairability by improving Mean Time Between Failures (MTBF), managing swap rates, and developing advanced repair strategies
- Partner with System Software and Diagnostics/Test teams to enhance the detection, characterization, and resolution of fleet-scale hardware failures.
- Lead the initiation and implementation of innovative product, process, and tool enhancement projects within cross-functional environments and integrate lessons learned from field performance data into New Product Introduction (NPI).
Similar jobs
Platform Engineer for AWSProvision and secure AWS resources for DAM · Manage streaming, messaging, caching, search, and database platforms · Ensure scalable, reliable performance and observability · ...
3 weeks ago
We are seeking an experienced Staff Platform Engineer with 5-8+ years of proven experience in regulated environments who supports small to mid-sized development teams focusing on infrastructure automation cloud integration and Infrastructure as Code methodologies. · ...
1 month ago
eGain is the leader in AI knowledge management solutions for enterprises. Join us in reimagining knowledge management as mission-critical infrastructure for the AI-powered enterprise. · ...
1 month ago
The mission of Speechify is to make sure that reading is never a barrier to learning. · ...
1 month ago
This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office. · The network security organization develops next-generation security products that enable efficient global communication. · ...
4 weeks ago
We are seeking a highly skilled DevOps Engineer with deep AWS expertise to design, provision, secure and operate cloud infrastructure supporting large-scale Digital Asset Management (DAM) platforms. ...
3 weeks ago
· info_outlineX · This is a specialized role which requires physical interaction with hardware equipment in a simulated data center environment, utilizing Google labs, power, and safety equipment. Regular development and processing of engineering hardware must be performed on si ...
12 hours ago
This is a specialized role which requires physical interaction with hardware equipment in a simulated data center environment using Google labs power and safety equipment Regular development and processing of engineering hardware must be performed on site. · Innovate and design o ...
1 month ago
Embark on an exciting journey as an Infrastructure Automation Engineer in our dynamic team This role offers a unique opportunity to shape the future of our infrastructure by leveraging cutting-edge automation technologies. · ...
1 month ago
Illumio is the leader in ransomware and breach containment, redefining how organizations contain cyberattacks and enable operational resilience. Powered by the Illumio AI Security Graph, our breach containment platform identifies and contains threats across hybrid multi-cloud env ...
1 month ago
· About Applied Intuition · Applied Intuition, Inc. is powering the future of physical AI. Founded in 2017 and now valued at $15 billion, the Silicon Valley company is creating the digital infrastructure needed to bring intelligence to every moving machine on the planet. Applied ...
11 hours ago
CrowdStrike protects people, processes and technologies that drive modern organizations by stopping breaches with its AI-native platform. · ...
5 days ago
We are seeking a Senior Search Platform Engineer with strong hands-on experience designing, operating, and optimizing large-scale distributed search platforms. · 10+ years of experience as a backend, platform, or infrastructure engineer. · ...
2 weeks ago
We are seeking an experienced Linux Platform Engineer to join our team at KRG Technologies. · This role involves various responsibilities including C programming, · Linux kernel development, · HW board bring up, · and Linux build tools. · ...
1 week ago
We are seeking a highly skilled DevOps Engineer with deep AWS expertise to design, provision, secure, and operate cloud infrastructure supporting large-scale Digital Asset Management (DAM) platforms. · The ideal candidate will have hands-on experience building resilient, observab ...
1 day ago
We re a team of engineers clinicians and innovators united by one purpose to make surgery smarter safer and more human. · University Enrollment Must be currently enrolled in and returning to an accredited degree-seeking academic program after the internship. · Internship Work Per ...
2 weeks ago
· Company Description · It started with a simple idea: what if surgery could be less invasive and recovery less painful? Nearly 30 years later, that question still fuels everything we do at Intuitive. As a global leader in robotic-assisted surgery and minimally invasive care, ou ...
11 hours ago
This is a specialized role which requires physical interaction with hardware equipment in a simulated data center environment, · utilizing Google labs, power and safety equipment. We design and build the hardware software and networking technologies that power all of Google's ser ...
2 weeks ago
We are looking for a Staff Data Platform Engineer to shape the strategy, · architecture and execution of our next-generation data ecosystem.At Applied Intuition you will be at the heart of enabling autonomous driving innovation—building · the systems that allow teams to harness m ...
1 month ago
About The Team · The Platform Innovation team is a collection of entrepreneurial engineers that partners directly with both our VP Product & Design and our Co-founder to identify strategic new business opportunities and build innovative solutions for them. · ...
1 month ago
We are looking for an experienced Senior Data Platform Engineer to lead the development and deployment of our data platform using Apache Druid. The ideal candidate will have strong proficiency in Python, experience with containerization and orchestration tools like Docker and Kub ...
1 week ago