Member of Technical Staff, LLM Evaluation - Mountain View, CA
1 week ago

Job summary
We're looking for outstanding individuals with experience in the social sciences machine learning and analysis of natural language to develop and implement cutting-edge methodologies to help us evaluate how well Copilot performs in real-world usage scenarios.Responsibilities
- Leverage expertise to measure the performance of Copilot identify failure modes and novel mitigation strategies including data mining prompt engineering LLM as a judge and classifier training.
- Creative problem solving navigating complexity with clarity independently shaping direction and delivering results even when the path isn't obvious.
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
We're looking for outstanding individuals with experience in the social sciences, machine learning and analysis of natural language to develop new methods to evaluate LLMs train classifiers experiment with data collection techniques and implement methodologies to provide real-tim ...
1 month ago
We are looking for a Member of Technical Staff to develop and implement cutting-edge methodologies to evaluate how well Copilot performs in real-world usage scenarios. · Leverage expertise to measure the performance of Copilot... · ...
1 week ago
We are looking for a Member of Technical Staff to develop and implement cutting-edge methodologies to evaluate how well Copilot performs in real-world usage scenarios. · ...
1 month ago
We're partnering with a deep-tech AI company building autonomous agentic systems for complex physical and real-world environments. The team operates at the edge of what's possible today designing AI systems that plan act recover and improve over long horizons in high-stakes setti ...
1 month ago
We're partnering with a deep-tech AI company building autonomous agentic systems for complex physical environments. · Build eval harnesses for agentic LLM systems offline + in-workflow ...
1 month ago
You'd be building the UI that turns messy LLM evaluation outputs into clear, explorable artifacts that researchers can trust. · ...
1 month ago
+ Work with a creative team of people who help to build the state-of-the-art Foundation Models that are used throughout Waymo's systems. · + Lead the development of end-to-end evaluation systems and benchmarks for Waymo Foundation models, · + Implement and extend large scale data ...
1 week ago
Develop machine learning solutions addressing open problems in autonomous driving to safely operate Waymo vehicles in dozens of cities and under all driving conditions. · ...
1 week ago
The project is focused on building production-grade GenAI solutions with emphasis on: RAG pipelines leveraging LLMs · Prompt engineering (system/tool prompts, function calling, versioning with evals) · ...
4 days ago
We're hiring a Senior Applied AI Engineer to join a fast-moving, · high-ownership team building next-generation AI assistant and productivity capabilities.This role blends LLM product engineering, · evection science, hillclimbing, and internal tool building with the pace and crea ...
3 weeks ago
We are looking for a highly skilled Data Scientist + ML Engineer (Generative AI) to join our client's team. · ...
3 weeks ago
The client Fynbosys Inc is seeking a GenAI Engineer to build production-grade GenAI solutions. · ...
3 days ago
We are looking for a Machine Learning Engineer to help build cutting edge ML infrastructure for building and serving LLM's at Moveworks.Design, build and optimize scalable machine learning infrastructure to support training, evaluation, and deployment of large language models. · ...
1 week ago
We're hiring a Senior Applied AI Engineer to join a fast-moving team building next-generation AI assistant and productivity capabilities. This role blends LLM product engineering, evaluation science, hillclimbing, and internal tool building with the pace and creativity of a start ...
3 weeks ago
We're hiring a Senior Applied AI Engineer to join a fast-moving team building next-generation AI assistant capabilities. · Design and ship LLM-powered features. · Evaluate models using hillclimbing loops. · ...
1 week ago
We're hiring a Senior Applied AI Engineer to join a fast-moving, high-ownership team building next-generation AI assistant and productivity capabilities.This role blends LLM product engineering, evaluation science, hillclimbing, and internal tool building with the pace and creati ...
6 days ago
At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. · ...
1 week ago
At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. · We value customer obsession, purposeful speed, ownership, and a relentless focus on outcomes. · ASAPP's AI Engineering team is seeking an enterprising talented machine lear ...
1 week ago
Job summary · Google's software engineers develop the next-generation technologies that change how billions of users connect, explore and interact with information and one another. · Design methods to compile synthesize and organize natural-language datasets that form the foundat ...
1 month ago
+We are an interdisciplinary team with an aim to empower providers, consumers, and clinical researchers. · +Bridge the gap between cutting-edge machine learning research and the delivery of high-impact consumer applications. · This role requires a unique blend of deep theoretical ...
2 weeks ago