- Building and improving our ecosystem around GPU-accelerated computing including developing large scale automation solutions
- Maintaining and building deep learning clusters at scale
- Supporting our researchers to run their flows on our clusters including performance analysis and optimizations of deep learning workflows
- Root cause analysis and suggest corrective action for problems large and small scales
- Finding and fixing problems before they occur.
- Bachelor's degree in computer science, Electrical Engineering or related field or equivalent experience.
- Minimum 5 years of experience designing and operating large scale compute infrastructure.
- Experience analyzing and tuning performance for a variety of HPC workloads.
- Working knowledge of cluster configuration managements tools such as Ansible, Puppet, Salt.
- Experience with HPC cluster job schedulers such as SLURM, LSF
- In depth understating of container technologies like Docker, Singularity, Shifter, Charliecloud.
- Proficient in Centos/RHEL and/or Ubuntu Linux distros including Python programming and bash scripting.
- Experience with HPC workflows that use MPI.
- Understanding of MLPerf benchmarking
- Familiarity with InfiniBand with IBOP and RDMA
- Understanding of fast, distributed storage systems like Lustre and GPFS for HPC workloads.
- Background with Software Defined Networking and HPC cluster networking
- Familiarity with deep learning frameworks like PyTorch and TensorFlow.
-
Machine Learning Infrastructure Engineer
Found in: Appcast US C2 - 6 hours ago
Intelletec California, United StatesML Systems Engineer - Lead Role · Join our team as the Lead ML Systems Engineer and drive the development of cutting-edge machine learning systems for video foundation (VFM) and language model (VLM) in production. In this role, you'll lead a talented team, set technical strategie ...
-
Senior Infrastructure Automation Engineer
Found in: Appcast US C2 - 2 hours ago
Sigmaways Inc California, United StatesWe are seeking a Senior Infrastructure Automation Engineer for our direct client with expertise in developing Infrastructure as a code using Terraform, AWS, CICD Pipeline · Responsibilities: · In this role, you will get an opportunity to broadly apply your engineering skills acro ...
-
Water Infrastructure Project Engineer
Found in: Appcast Linkedin GBL C2 - 2 days ago
Woodard & Curran California, United StatesWoodard & Curran is a national engineering, science, and operations firm with a simple vision for clean water, a safe environment, healthy communities, and happy people. As an employee-owned company, we strive to cultivate diverse teams and encourage collaboration in an equitable ...
-
Water Infrastructure Project Engineer
Found in: Talent US C2 - 2 days ago
Woodard & Curran California, United States Full timeWoodard & Curran is a national engineering, science, and operations firm with a simple vision for clean water, a safe environment, healthy communities, and happy people. As an employee-owned company, we strive to cultivate diverse teams and encourage collaboration in an equitable ...
-
Machine Learning Infrastructure Engineer
Found in: Appcast US C2 - 3 days ago
Stealth Startup California, United StatesAbout Us · We're building a co-pilot for hardware designers. Our mission is to enable 9M mechanical engineers to iterate through designs 1000x faster. · We are building our geometry + physics driven foundation model for each class of part design · We've raised a first round of c ...
-
Site Reliability Engineering Manager
Found in: Appcast US C2 - 2 days ago
developrec California, United StatesSRE Lead/Manager | San Diego, CA | Full-time · Role Overview: As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently standi ...
-
Head of Information Technology
Found in: Appcast US C2 - 2 days ago
Harvey Nash California, United StatesJob Title: Head of IT Infrastructure Technology · Location: Pier 400, in Los Angeles, CA · Perm/FTE Role · This position is based in Pier 400, in Los Angeles, CA and its 100% onsite. · US citizens and Green Card Holders and those authorized to work in the US are encouraged to app ...
-
Data Engineer
Found in: Appcast US C2 - 2 days ago
People Source Consulting California, United StatesWould you be interested in a data engineering role at a fast-paced AI (LLM) start-up comprised of Meta, Google, AWS, and Microsoft alumni? · You can expect to: · Curate and manage large-scale data ingestion and indexing pipelines, ensuring data quality and error handling. · Desi ...
-
Senior Machine Learning Engineer
Found in: Appcast US C2 - 4 days ago
Acceler8 Talent California, United StatesAbout Us: · Your journey as a Senior ML Engineer is not just about engineering; it's about pioneering. You will spearhead the development of novel software systems designed to empower data scientists and engineers across the spectrum. Your mission will be to diagnose and rectify ...
-
Senior Software Engineer
Found in: Appcast US C2 - 2 days ago
Cyber Spring California, United StatesI am currently working with a Seed-Stage AI business developing AI & Cloud-Based Security technologies and improving their field with unprecedented quality led by serial entrepreneurs and experts in the AI & Robotics space. · Working closely with the CTO, my client are looking fo ...
-
Principal SRE
Found in: Appcast US C2 - 2 days ago
Borneo California, United StatesOverview: · Borneo is seeking a skilled, experienced, and hands-on Principal Engineer to drive innovation and contribute to our mission of transforming data security and privacy. As the Principal Engineer, you will be a driving force in shaping the technical strategy and architec ...
-
Principal Cloud Product Security Architect
Found in: Appcast US C2 - 1 day ago
Quantum Search Partners California, United StatesA Quantum Search Partners client ($90B+ revenue global leader in electronics, media, & entertainment) is seeking a Principal Cloud Product Security Architect. This person will work cross-functionally with teams across R&D, product development, product security information securit ...
-
Data Team Lead
Found in: Appcast US C2 - 2 hours ago
Storm2 California, United StatesFounding Data Lead -Permanent · AI-powered Web3 Security platform · Up to $200k +Stock Option · US San Francisco Bay Area/ hybrid open to other commutable areas · Our key client is a US an AI-powered Web3 Security platform, which have had 5 rounds of funding and raised millions o ...
-
resident engineer
Found in: MyJobHelper US C2 - 4 days ago
MWH , CA, United StatesMWH Constructors (MWH), a global leader in heavy civil construction of water and wastewater facilities, is currently seeking a Resident Engineer to join our construction management services (CMS) group in support of critical infrastructure construction work in Southern California ...
-
resident engineer
Found in: MyJobHelper US C2 - 4 days ago
MWH , CA, United StatesMWH Constructors, a global leader in heavy civil construction of water and wastewater facilities, is currently seeking a Resident Engineer to join our construction management services (CMS) group in support of critical infrastructure construction work in Southern California. · Th ...
-
resident engineer
Found in: MyJobHelper US C2 - 4 days ago
MWH , CA, United StatesMWH Constructors (MWH), a global leader in heavy civil construction of water and wastewater facilities, is currently seeking a Resident Engineer to join our construction management services (CMS) group in support of critical infrastructure construction work in Southern California ...
-
Senior Network Engineer
Found in: Appcast US C2 - 2 days ago
Stealth Startup California, United StatesTitle: Sr. Network Engineer. · Location: US Remote (Must be in US Pacific Time Zone) · Looking for a Senior Network Engineer who will be responsible for managing routing, switching, VPN, and firewall infrastructure at our office locations and in the cloud. · Duties: · Manage, mo ...
-
resident engineer
Found in: MyJobHelper US C2 - 4 days ago
Dar Group , CA, United StatesTYLin is a globally recognized, full-service infrastructure consulting firm committed to providing innovative, cost-effective, constructible designs for the global infrastructure market. With over 3,000 employees throughout the Americas, Asia, and Europe, the firm provides suppor ...
-
research scientist
Found in: MyJobHelper US C2 - 4 days ago
DataBricks , CA, United StatesP-1131 · At Databricks, we are obsessed with enabling data teams to solve the world's toughest problems. We do this by building and running the world's best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own ...
-
DevOps Engineer
Found in: Appcast US C2 - 2 days ago
Plexus Resource Solutions California, United StatesPlexus is working with a leading zero-knowledge proof-based Layer 1 blockchain. They are seeking an experienced DevOps Engineer to join their dynamic infrastructure team. · The ideal candidate will have a strong background in AWS cloud infrastructure, adept at managing and optimi ...
Senior Infrastructure Engineer - California, United States - Sustainable Talent
Description
Sustainable Talent is partnering with Nvidia a global leader who's been transforming computer graphics, PC gaming, and accelerated computing for over 25 years.
We are looking for a HPC Cluster Engineer to support our client's GPU/HPC Infrastructure Team.
This is a W-2 full-time contract based in Santa Clara, CA - Hybrid work option. The pay is between $90-$120/ hr based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture
As a member of the GPU/HPC Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek an expert to identify architectural changes and/or completely new approaches for our GPU Compute Clusters. As an expert, you will help us with the strategic challenges we encounter including computer, networking, and storage design for large-scale, high-performance workloads, effective resource utilization in a heterogeneous compute environment, evolving our private/public cloud strategy, capacity modeling, and growth planning across our global computing environment.
What you'll be doing:
What we need to see:
Ways to stand out from the crowd:
Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.