Staff Software Engineer, Inference - United States

Only for registered members United States

3 days ago

Default job background

Job summary

Build low-latency inference pipelines for on-device deployment enabling real-time next-token and diffusion-based control loops in robotics design optimize distributed inference systems on GPU clusters pushing throughput with large-batch serving efficient resource utilization implement efficient low-level code CUDA Triton custom kernels integrate seamlessly into high-level frameworks optimize workloads for both throughput batching scheduling quantization latency caching memory management graph compilation develop monitoring debugging tools guarantee reliability determinism rapid diagnosis of regressions across both stacks.


Lorem ipsum dolor sit amet
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.

Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.

Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Get full access

Access all high-level positions and get the job of your dreams.



Similar jobs

  • Only for registered members North America

    BentoML is a leading inference platform provider that helps AI teams run large language models and other generative AI workloads at scale. · You will improve the speed and efficiency of large language models at the GPU kernel level through the inference engine and across distribu ...

  • Only for registered members United States

    Job summary · Generative AI Inference Engineer · We are seeking passionate Machine Learning Engineers to join our Inference team. · ...

  • Senior Engineer

    3 weeks ago

    Only for registered members United States Full time

    We are looking for an experienced Field-Applications Engineer to help deploy a new generation of code translation tools enabled by AI and modern verification techniques. · Deploy and manage containerized services using Docker. · Deploy and run Python based GenAI pipelines interac ...

  • ML Engineer

    3 days ago

    Only for registered members United States

    Elevating the quality of human life through every conversation. · ...

  • Only for registered members United States Remote job

    +Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. · +This is a globally remote opportunity for exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU ke ...

  • Only for registered members United States

    Join the Akamai Inference Cloud Team to drive innovation at the intersection of AI and edge computing. · 8 years of relevant experience and a Bachelor's degree or its equivalent · ...

  • Only for registered members United States

    Nebius is leading a new era in cloud computing to serve the global AI economy. · We create the tools and resources our customers need to solve real-world challenges and transform industries, · without massive infrastructure costs or the need to build large in-house AI/ML teams. · ...

  • Only for registered members United States Remote job

    +Nebius is leading a new era in cloud computing to serve the global AI economy. · +Deep Technical Sourcing (LLM, Inference, Systems, GPU): Proactively identify and engage senior-level engineers and researchers across a wide range of AI/ML and systems domains.Use advanced sourcing ...

  • Only for registered members North America

    Forward Deployed Engineers sit at the intersection of core engineering, product strategy, and customer success. You'll partner directly with customers to understand real-world problems, then design, build, and launch production-ready AI solutions on the Bento platform. · ...

  • Only for registered members United States

    We're seeking an experienced Senior Engineering Manager to lead our AI Inference Cloud team in designing and developing our globally-distributed AI inference platform. · Responsibilities · ...

  • Only for registered members United States

    We are seeking experienced AI Developers to help us shape the future of OCI Networking with AI. · This position offers an opportunity to work on cutting-edge AI applications and includes a collaborative work environment, · competitive benefits,and the chance to contribute to tran ...

  • Only for registered members United States of America

    We're looking for statistics specialists who live and breathe probability theory, statistical modeling, data analysis, hypothesis testing, regression analysis, multivariate statistics, · Bayesian inference, time series analysis, · experimental design, · and machine learning algor ...

  • Only for registered members United States

    Job summary · We're looking for a scrappy, resourceful Developer Advocate to help grow Token Factory. · QualificationsStrong understanding of AI developers and building patterns. · ...

  • AI Engineer

    3 days ago

    Only for registered members United States

    We're looking for an AI Engineer to design, implement and optimize advanced AI systems that balance quality, performance and cost. · ...

  • Only for registered members United States

    We're looking for a scrappy, resourceful Developer Advocate to help grow Token Factory, Nebius' high-performance inference platform built for teams running real production AI workloads at scale. · Help developers know, adopt and use Token Factory for inference use cases. · Build ...

  • Only for registered members United States

    We are seeking a Principal Software Engineer to join our Akamai Inference Cloud Team. As a technical leader and architect, you will drive critical technical decisions and guide the team in building a platform that delivers OpenAI-compatible endpoints while orchestrating inference ...

  • Only for registered members United States

    We are looking for an experienced Field-Applications Engineer to help deploy a new generation of code translation tools enabled by AI and modern verification techniques. · ...

  • Only for registered members United States

    +Nebius is seeking an experienced Senior ML Solutions Architect to support customers leveraging Nebius Token Factory's serverless inference platform for open-source LLMs across multiple modalities. · +ResponsibilitiesDesign and implement LLM-based solutions using Nebius Token Fac ...

  • GenAI Intern

    3 days ago

    Only for registered members United States

    The selected candidate will have the opportunity to contribute to a Machine Learning Operations (MLOps) platform. · ...

  • Only for registered members United States

    The selected candidate will contribute to a Machine Learning Operations (MLOps) platform supporting state-of-the-art training and inference features. The candidate will also contribute to GenAI and AI/ML systems. · Advanced AI/ML systems with a focus on next-generation model trai ...