Staff Software Engineer, Machine Learning Infrastructure - Sunnyvale, CA, USA
3 hours ago

Job description
Minimum qualifications:
- Bachelor's degree or equivalent practical experience.
- 8 years of experience programming in C++.
- 5 years of experience testing and launching software products.
- 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
- 3 years of experience with software design and architecture.
Preferred qualifications:
- Master's degree or PhD in Engineering, Computer Science, or a related technical field.
- 8 years of experience with data structures and algorithms.
- 3 years of experience in a technical leadership role leading project teams and setting technical direction.
- 3 years of experience working in a complex, matrixed organization involving cross-functional or cross-business projects.
- Experience with machine learning infrastructure.
About the job
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
Control Plane team's mission is to provide an efficient and reliable above-the-cell scheduling system for ML (and non-ML) workloads at Alphabet. We are part of the larger Compute infrastructure team which is responsible for scheduling work on almost all production machines. Our current areas of focus include dynamic/static pool scheduling, TPU/GPU platform fungibility, spatial flexibility, and better handling of emerging ML workloads. We are a welcoming team with a strong focus on the well being of our team members.
Google Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.
The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
- Collaborate and engage with ML practitioners and partner teams in Google's ML infrastructure stack to gather requirements and identify opportunities for efficiency initiatives.
- Lead the design and execution of new initiatives across the sub-teams in Control Plane.
- Ensure production excellence, helping own and evolve the Control Plane stack.
- Grow and mentor engineers.
Similar jobs
We are looking for a Machine Learning Infrastructure and Data Engineer to join our Video Computer Vision (VCV) team. · Help us create the data and infrastructure ecosystem needed to support our ML development. · ...
1 week ago
We are working on exciting technologies for future Apple products. Our team delivers computer vision algorithms that power many Apple technologies human understanding and human intelligence algorithms with applications for digital humans, health and AI. · Developing pipeline infr ...
1 week ago
We are working on improving the performance and reliability of the existing Critical User Journeys (CUJs) to speed up Gemini AI research. · supporting new features to meet new AI research requirements, · refining Hydra APIs and data models to allow rapidly increasing data sizes, ...
4 weeks ago
Staff Software Engineer, On-Device Machine Learning Infrastructure
Only for registered members
We are looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes o ...
2 weeks ago
The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits. Our mission is to find ways to increase availability, reduce risk to production traffic and more efficiently operate ML hardware over its lifecycle. · ...
1 month ago
Embark on an exciting journey as an Infrastructure Automation Engineer in our dynamic team This role offers a unique opportunity to shape the future of our infrastructure by leveraging cutting-edge automation technologies. · ...
1 month ago
Meta is seeking an AI Software Engineer to join our Research & Development teams. · ...
1 month ago
The Data and Test Flywheel Engineer will be a key member of Applied Intuition's data flywheel initiative — building the systems that connect vehicle data collection, training and automated model improvement. · Build and maintain large-scale data processing pipelines (ETL) for ing ...
1 week ago
· About Applied Intuition · Applied Intuition, Inc. is powering the future of physical AI. Founded in 2017 and now valued at $15 billion, the Silicon Valley company is creating the digital infrastructure needed to bring intelligence to every moving machine on the planet. Applied ...
3 hours ago
· The Data and Test Flywheel Engineer will be a key member of Applied Intuition's data flywheel initiative — building the systems that connect vehicle data collection training and automated model improvement. · ...
1 week ago
+Job Summary · The Synthetic Data Group (SDG) Cloud Infrastructure team at Apple is responsible for building the next generation of secure, high-performance cloud services to support machine learning/GenAI and synthetic data initiatives. · +Architect, build, and operate secure, s ...
2 weeks ago
We are looking for experienced practitioners who are inspired to bring innovation and build transformative products. · ...
3 weeks ago
· ...
1 month ago
Research Data Scientist, Network and Machines Optimization, Cloud
Only for registered members
Operations Data Science is a team of Data Science (Analyst and Research) experts who provide model-based decision support to scale Google's Technical Infrastructure optimally. · ...
1 month ago
· Minimum qualifications: · Bachelor's degree in a technical field, or equivalent practical experience. · 5 years of experience in program management. · Experience managing machine learning or artificial intelligence programs. · Preferred qualifications: · 5 years of experience ...
3 hours ago
We are looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design... · ...
3 weeks ago
+p>We are seeking a Staff Software Engineer to join our team in Sunnyvale, CA. As a software engineer at Google, you will develop technologies that change how billions of users connect and interact with information. · +ul+li>Design large-scale software solutions/li/+li>Provide te ...
5 days ago
We are committed to delivering seamless experiences for all global users across all Apple devices. · The Infrastructure Services Programs & Solutions - Special Programs Engineering Program Manager will partner with cross-functional teams across Apple to explore and introduce new ...
1 week ago
Program Manager III, Operations Efficiency, Cloud AI Systems
Only for registered members
A problem isn't truly solved until it's solved for all. That's why Googlers build products that help create opportunities for everyone, whether down the street or across the globe. · ...
2 weeks ago
Flawless global user experiences require secure, reliable, high-performant infrastructure availability in every region where Apple operates. · ...
1 month ago