AI Infrastructure Engineer (788235) - Morrisville
1 day ago

Job description
AI Infrastructure EngineerLocation:
On‑Site in Morrisville, NC
Pay Rate:
Up to $55/hour
Job Type:
Full‑Time, Contract / Potential for direct-hire
Are you an experienced GPU systems engineer passionate about building and maintaining high‑performance AI infrastructure? Join our Global Fortune 500 Tech Client, a world‑renowned global technology leader at their Morrisville, NC campus and play a critical role in supporting advanced machine learning and large‑scale AI workloads.
In this role, you'll ensure GPU servers and AI clusters run at peak performance—supporting cutting‑edge AI research and enterprise‑scale LLM training environments.
What You'll DoHardware Management & Troubleshooting
Monitor, maintain, and optimize high‑performance GPU servers/workstations.
Diagnose and resolve hardware issues (GPU faults, cooling/power issues, component failures).
Coordinate hardware repairs, upgrades, and replacements to maintain system uptime.
Software & Driver Administration
Install, configure, and update Linux OS (Ubuntu/CentOS), NVIDIA CUDA drivers, and supporting software.
Ensure compatibility between hardware, drivers, and ML frameworks.
Performance Benchmarking & Optimization
Execute and analyze MLPerf or similar benchmarking suites.
Identify bottlenecks and tune configurations for ML training performance.
System Diagnostics & Root‑Cause Analysis
Continuously monitor system performance and stability.
Investigate kernel errors, networking issues, and resource contention within training clusters.
Implement corrective actions to prevent downtime and performance degradation.
Infrastructure Operations
Manage logging, backups, firmware updates, system patching, and cluster health.
Support containerized AI infrastructure using Docker/Kubernetes.
Required Qualifications
3+ years managing GPU‑accelerated servers or HPC environments.
Strong Linux system administration skills (shell scripting, package management, networking).
Hands‑on experience with NVIDIA GPU hardware (A100, H100, etc.) and CUDA toolkit/drivers.
Familiarity with ML frameworks:
TensorFlow, PyTorch, or Hugging Face.
Skilled in diagnostic tools:
nvidia-smi, dmesg, top/htop, Prometheus/Grafana.
Solid understanding of AI infrastructure, containerization, and distributed training concepts.
Excellent problem‑solving abilities and proactive ownership of system reliability.
Preferred Skills
Experience with cluster orchestration:
Slurm, Kubernetes, Ray.
Knowledge of server hardware diagnostics (IPMI, BIOS configs, RAID arrays).
Background in MLOps or DevOps for AI environments.
Certifications such as RHCE, NVIDIA credentials, or similar.
Ability to work independently in a fast‑paced, highly technical environment.
Why This Opportunity Stands Out
Direct contribution to large‑scale AI and LLM infrastructure.
Work on a state‑of‑the‑art enterprise campus with cutting‑edge GPU hardware.
Join an engineering team driving next‑generation machine learning innovation.
We're very excited about this incredible opportunity to lead in the AI Workspace with a global leader in computing.
APPLY NOW HERE
Similar jobs
+Join our team as a Cloud Systems Engineer and help eliminate barriers for patients by increasing their access to medications and lifesaving treatments. · +Azure-first cloud solutions with supporting workloads across AWS and hybrid on-prem environments · , Enterprise-scale cloud ...
3 weeks ago
We are seeking an expert Contract Infrastructure Automation Engineer to execute a critical set of projects focused on automating and scaling our core data infrastructure platforms (Kafka,Cassandra,RDS,Postgres,Elasticache) primarily within the AWS cloud environment.This is a high ...
1 month ago
As a Cloud Systems Engineer, you will join our team on a journey to help eliminate barriers for patients, increase their access to medications, and help them receive lifesaving treatments while working in an environment that nurtures you. · ...
3 weeks ago
We are seeking an expert Contract Infrastructure Automation Engineer to execute a critical set of projects focused on automating and scaling our core data infrastructure platforms (Kafka, Cassandra, RDS, Postgres, Elasticache) primarily within the AWS cloud environment. · This is ...
1 month ago
We're looking for a Cloud Systems Engineer to join our team in Morrisville. As a key member of our infrastructure group, you will help shape the core technologies used at Valeris. · The role will have a primary focus on project work, with escalation from other teams as needed. Th ...
5 days ago
Piper Companies is seeking a Sr. Software Engineer to support backend platform initiatives within a fast-growing technology-driven organization in the industry. · ...
2 weeks ago
Piper Companies is seeking a Sr. Software Engineer to support backend platform initiatives within a fast‑growing technology‑driven organization. · Design, build, and maintain scalable backend APIs using Java and Python. · Develop and optimize microservices supporting high‑availab ...
2 days ago
Piper Companies is seeking a Sr. Software Engineer to support backend platform initiatives within a fast-growing technology-driven organization. · Design build and maintain scalable backend APIs using Python. · Develop and optimize microservices supporting high-availability distr ...
1 week ago
Computer World Services Corp is seeking an experienced Telecom Specialist to manage and support telecommunications and network systems with a specialization in Cisco hardware and services. · ...
1 month ago
Piper Companies seeks a Sr Software Engineer to support backend platform initiatives in a fast-growing tech-driven organization. · ...
1 week ago
Title: MLOps Engineer · Industry: Technology · Location: Morrisville, NC · Responsibilities · Architect and implement a scalable MLOps Control Plane including model registry and governance. · Develop and maintain a standardized Data Adapter SDK to unify data ingestion. · Build CI ...
1 day ago
+We are seeking a highly capable Project Manager to join our infrastructure team supporting a large-scale application migration initiative. · +Lead and coordinate the migration of customer applications from Hyperflex to UCM (on-prem managed service by Cisco) · Support the custome ...
1 month ago
Business Development/Product Manager for Tele-com Data Center
Only for registered members
The Solutions Business Development and Product Manager will lead the technical and commercial development of integrated telecom infrastructure solutions supporting Delta's transformation from a product and service provider into a comprehensive solution provider. · ...
3 days ago
Business Development/Product Manager for Tele-com Data Center
Only for registered members
This role requires deep technical expertise in telecom and data center infrastructure and strong customer engagement skills to lead the technical and commercial development of integrated telecom infrastructure solutions. · ...
3 days ago
As the Director of Sales for Delta's Energy Infrastructure Business,you will play a pivotal role in driving our growth strategy and expanding our market presence. · Leveraging your deep understanding of the energy transition (PV inverters, energy storage,and electrification),part ...
1 month ago
We are seeking a highly skilled and experienced Senior Engineer to join our Active IQ Engineering team. As a Senior Engineer, you will be responsible for designing, optimizing, and managing the Active IQ infrastructure. · ...
1 month ago
As the Director of Sales for Delta's Energy Infrastructure Business, you will play a pivotal role in driving our growth strategy and expanding our market presence. Leveraging your deep understanding of the energy transition (PV inverters, energy storage, and electrification), par ...
1 month ago
We are all about helping customers turn challenges into business opportunity. · We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches. · ...
1 month ago
NetApp ActiveIQ is a digital advisor that uses AIOps to simplify and automate proactive care. · Designs optimizes manages Active IQ infrastructure. · Leverages expertise in NetApp products ONTAP knowledge SRE automation security. · ...
1 month ago
Join a cross-functional DevOps team responsible for building and maintaining modern cloud infrastructure and internal platform capabilities that accelerate software delivery. · You'll help deliver reliable, scalable solutions for hundreds of internal users, with a focus on system ...
1 week ago