Sr Kubernetes Support Engineer - Dallas-Fort Worth Metroplex
1 day ago

Job description
About Applied Digital:
At Applied Digital, we are the epicenter of AI innovation, crafting cutting-edge data center solutions tailored for the demands of high-performance computing. Designed from the ground up to support AI and machine learning workloads, our infrastructure is the backbone of tomorrow's technological advancements, including AI-driven video and generative platforms.
We are:
- Forward-Thinkers: With a keen eye on current market trends and future innovations, we adapt swiftly and lead technological evolution.
- Resilient: We navigate complex challenges and emerge stronger, delivering robust and reliable solutions for industry pioneers.
- Innovative Designers: Leveraging the latest technologies, we create visionary solutions that redefine industry standards.
At Applied Digital, we are committed to solving intricate problems, advancing business initiatives, maximizing operational efficiency, and reducing our carbon footprint. We are a team of resilient, forward-thinking innovators driving the AI revolution.
Position Summary:
Applied Digital is seeking an experienced
Sr
Kubernetes Support Engineer
to help manage our deployed K8 system, both internal and external. This role will help us support, design and maintain the complex systems that live on our cloud platforms. This role will sit at the center of our product helping develop our entire resource provisioning lifecycle from a single API request to the scheduling and spin-up of multiple resources.
You will be the primary source of contact for our customers using K8 and also taking an architect role for our core provisioning logic, creating a robust system that intelligently orchestrates Kubernetes clusters, Micro VMs and Slurm-managed HPC resources. You will work closely with our front-end team to build the resources that expose this power to our users, and with the infrastructure team to ensure the backend is scalable, resilient, and efficient.
The ideal candidate is a strong systems-level thinker who is passionate about automation, distributed systems, and building powerful HPC clusters that are easily adaptable to a customer's design requirements.
Key Responsibilities:
- Design & Develop Provisioning Services: Architect and write high-quality, scalable backend services (e.g., in Go, Python, or Rust) that handle the logic for provisioning and managing compute and storage resources.
- Kubernetes Design and Integration: Develop controllers and operators to automate the deployment and lifecycle management of containerized workloads and services on multiple Kubernetes clusters.
- Slurm Orchestration: Build the "bridge" between our cloud-native API and our HPC backend, writing the logic to dynamically generate Slurm batch scripts, submit jobs, and monitor their state.
- MicroVM Management: Implement provisioning workflows for lightweight MicroVMs (using technologies like Firecracker, KubeVirt, or Kata Containers) to ensure fast-boot times and secure workload isolation.
- Storage Provisioning: Write the automation to dynamically provision, attach, and manage various storage solutions (e.g., block storage, shared file systems) for provisioned workloads.
- Observability & Monitoring: Implement comprehensive monitoring, logging, and tracing (using tools like Prometheus, Grafana, Loki) to ensure the health and performance of all systems.
- Infrastructure as Code (IaC): Use tools like Terraform, Ansible and Git to track and manage code version for the Kubernetes cluster and related infrastructure.
Basic Qualifications:
- 10+ years of professional Kubernetes development experience, with a strong focus on building scalable distributed systems. Deep, hands-on experience with Kubernetes in a production environment (cluster management, writing operators, controllers, and custom resource definitions (CRDs)).
- Proficiency in a modern language (e.g., Go, Python, Bash, JSON).
- Solid understanding of container technologies (Docker, container) and the container ecosystem.
- Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.
- Experience collaborating with front-end teams and defining API contracts.
- Preferred Qualifications
- Direct experience with Slurm or other HPC schedulers (e.g., LSF, PBS).
- Experience with MicroVM or sandboxed container technologies (e.g., Firecracker, Kata Containers, gVisor, KubeVirt).
- Knowledge of scalable storage solutions (e.g., Weka, Ceph, MinIO, or cloud-provider storage like S3, EBS).
- Experience building CI/CD pipelines (e.g., Jenkins, GitLab CI, ArgoCD).
- Familiarity with monitoring and observability stacks (Prometheus, Grafana, ELK/Loki).
- Contributions to open-source projects.
Please note that Applied Digital is currently unable to sponsor new applicants for employment authorization or provide immigration-related support for this position. This includes, but is not limited to, visa categories such as H-1B, F-1 OPT, F-1 STEM OPT, F-1 CPT, J-1, TN, E-2, E-3, L-1, O-1, and any Employment Authorization Documents (EADs) or other work authorizations that require employer sponsorship.
Physical Requirements:
- Able to remain in a seated position for an extended period and to lift and carry up to 15 lbs. (office manuals, case notebooks, case files, case materials, standard boxes, report binders, etc.) as needed.
The company has reviewed this job description to ensure that essential functions and basic duties have been included. It is intended to provide guidelines for job expectations and the employee's ability to perform the position described. It is not intended to be construed as an exhaustive list of all functions, responsibilities, skills and abilities. Additional functions and requirements may be assigned by supervisors as deemed appropriate. This document does not represent a contract of employment, and the company reserves the right to change this job description and/or assign tasks for the employee to perform, as the company may deem appropriate.
This job description in no way states or implies that these are the only duties to be performed by the employee(s) incumbent in this position. Employees will be required to follow any other job-related instructions and to perform any other job-related duties requested by any person authorized to give instructions or assignments. All duties and responsibilities are essential functions and requirements and are subject to possible modification to reasonably accommodate individuals with disabilities. To perform this job successfully, the incumbents acknowledge that they possess the skills, aptitudes, and abilities to perform each duty proficiently. Some requirements may exclude individuals who pose a direct threat or significant risk to the health or safety of themselves or others. This document does not create an employment contract, implied or otherwise, other than an "at will" relationship.
The company is an Equal Opportunity Employer, drug free workplace, and complies with ADA regulations as applicable.
Similar jobs
We are looking for a high-energy driven SaaS Support Engineer to join our organization KMS Lighthouse is an industry leader in knowledge management working with Fortune 500 and Fortune 1000 enterprise-level customers. · Responding to customer requests and escalations. · Troublesh ...
1 month ago
We are looking for a Technical Support Engineer to join our fast-growing tech team. You will be responsible for local projects' technical support and work closely with senior leaders across various departments. · ...
3 weeks ago
We are looking for a Technical Support Engineer to join our fast-growing tech team. · ...
1 month ago
This role will be hybrid in Dallas or Raleigh. The Senior Application Support Engineer is responsible for supporting workflows across front-, middle-, and back-office teams. This role is an SME for our FIX connectivity, trade flows, · and core application platforms.Bachelor's Deg ...
4 days ago
Energize Group is partnered with a leading IT consulting organization supporting a large enterprise VMware re-platforming initiative. · Support VMware re-platforming and infrastructure modernization initiatives · Deploy, manage, and scale Kubernetes clusters · Build and maintain ...
3 days ago
The Sr. Director of Engineering – Truck Equipment provides overall strategic leadership direction and engineering excellence across the Truck Equipment business This role is accountable for driving innovation product development technical execution and engineering operations to s ...
1 week ago
We are currently looking to recruit a Senior Electrical Engineer for one of our clients based in Fort Worth, Texas.Responsibilities: · Lead and manage a team of Electrical Engineers and Electrical Drafters... · ...
3 weeks ago
We are supporting a leading aerospace and critical systems organization involved in next-generation aviation and airspace modernization programs across the United States. This role is ideal for a structured systems thinker with a passion for aviation, safety, and large-scale tech ...
1 month ago
A dynamic civil engineering and surveying firm is seeking a skilled Civil CAD Technician / Designer to support land development and infrastructure design projects in the DFW area. · ...
1 month ago
The Client is looking for a System Engineer to provide advanced technical support to their customers in North America. · ...
2 weeks ago
GelSight · is expanding its Applications Engineering team and looking for a hands-on engineer who enjoys working directly with customers and cutting-edge metrology technology used by leaders like · NASA, SpaceX, Boeing, · and Blue Origin. · Deliver on-site & virtual demos, · trai ...
1 month ago
This Automation Sales Engineer role is ideal for experienced professionals passionate about delivering engineered-to-order material handling and robotic automation systems. · ...
1 week ago
+ Develop deep knowledge of CIMON’s product portfolio. + Work directly with customers to understand how CIMON products are used in real machines. · +Job summary: Hands-on engineering role focused on helping customers design, implement and operate real industrial automation system ...
4 weeks ago
We are seeking a mission-focused Systems Engineer to support the design, development, integration and lifecycle management of aviation and ground systems for Department of Defense (DoD) programs. · Develop and manage system-level requirements architecture for complex aviation and ...
3 days ago
A growing manufacturer in the HVAC and thermal solutions space is seeking an experienced Program Manager to lead complex cross-functional programs supporting customers across automotive commercial and data center markets. · ...
1 month ago
Kelly Engineering seeks electrical engineers (Levels III–VI) for direct-hire opportunities in San Luis Obispo, CA or Fort Worth,TX. · Lead design development testing of electrical systems unmanned aircraft platforms. · Develop troubleshoot analog/digital circuits microcontroller ...
1 month ago
This role offers hands-on exposure to early-stage power generation and energy infrastructure projects. · ...
1 month ago
We're looking for an experienced engineering manager to oversee facility design and operations at a growing advanced manufacturing site. · This role is ideal for someone who enjoys leading diverseengineering teams, developing infrastructure systems, · and improving how complex fa ...
1 month ago
The Entry-Level Sales Engineer is a developmental, early-career role designed for individuals who want to build a long-term career in industrial automation and technical B2B sales. · Support customer development and sales execution while building foundational technical knowledge ...
1 month ago
Insight Global is seeking a Senior System Integration Engineer to support a large healthcare client. · Provide programming and technical development of integrated solutions. · D deliver high quality interoperability solutions that support current and future clinical and business ...
3 weeks ago