Machine Learning Engineer: Evaluation - San Francisco, CA
6 days ago

Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
At Retool, we're on a mission to bring good software to everyone. · We believe that the future of software development lies in abstracting away the tedious and repetitive tasks developers waste time on, · while creating reusable components that act as a force multiplier for futur ...
1 month ago
· About Distyl AI · Distyl AI develops production-grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in-house software accelerators, and deep enterprise AI expertise, we deliver working AI systems with ...
2 days ago
About Distyl AI · Distyl AI develops production-grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in-house software accelerators, and deep enterprise AI expertise, we deliver working AI systems with ra ...
3 days ago
We are looking for engineers to join us on a 6-month contract (with the possibility of extension) our Engineering Team. The primary work is split between engineering work to port external benchmarks to run on internal infrastructure and developing novel model evaluations. · ...
1 month ago
We're on a mission to bring good software to everyone. · We believe that the future of software development lies in abstracting away the tedious and repetitive tasks developers waste time on, · while creating reusable components that act as a force multiplier for future developer ...
3 weeks ago
We are looking for engineers to join us on a 6-month contract (with the possibility of extension) our Engineering Team. · The primary work is split between engineering work to port external benchmarks to run on internal infrastructure and developing novel model evaluations. · You ...
1 month ago
We are looking for engineers to join us on a 6-month contract (with the possibility of extension) our Engineering Team. The primary work is split between engineering work to port external benchmarks to run on internal infrastructure and developing novel model evaluations. · Porti ...
1 month ago
We are looking for engineers to join us on a 6-month contract (with the possibility of extension) our Engineering Team. · The primary work is split between engineering work to port external benchmarks to run on internal infrastructure and developing novel model evaluations. · ...
1 month ago
Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams.Develop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases. · 5+ years of professional software engineering experie ...
2 weeks ago
Waymo is looking for a Software Engineer to build the metrics and pipelines that grade its hybrid environment simulator. The ideal candidate will have experience in systems engineering and AI, with proficiency in C++ or Python. · ...
1 week ago
This is where algorithms meet steel-toed boots. You'll collaborate with construction veterans and world-class engineers to solve physical-world problems that simulations can't touch. · Responsibilities · Design and maintain eval systems: Build pipelines for measuring system perfo ...
2 weeks ago
We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. · ...
1 month ago
+Job summary · Join our team bringing advanced autonomy to the built world. · We're deploying autonomous systems on heavy construction machinery across the country, · accelerating project schedules of billion-dollar infrastructure projects and improving safety on job sites. · ++W ...
2 weeks ago
+Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability tools. · +Develop and productionalize evaluation frameworks for individual system component ...
1 week ago
+Join the team bringing advanced autonomy to the built world. · +We're moving AI out of the lab and into the real world. · Our team is composed of industry veterans who helped launch Waymo, scaled Segment to a $3.2B acquisition, and grew Uber Freight to $5B in revenue. · ...
6 days ago
About Anthropic's mission is to create reliable, interpretable and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. · ...
1 week ago
We seek to learn from deployment and broadly distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely.About The RoleIn this role, you'll lead development of the systems we use to evaluate the quality of our AI models and products. · ...
1 month ago
We are looking for an experienced metrics engineer to join our Data Insights team. · The successful candidate will have a strong background in quantitative fields such as computer science or mathematics. · Responsibilities include architecting new analytics data collection,workin ...
1 month ago
Elicit radically increases the amount of good reasoning in the world. · ...
1 month ago
We need someone to own the technical foundation of our auto-evaluation systems. Our evals are currently much slower than they need to be, and our interfaces aren't optimized for the diverse set of people who need to use them—ML engineers iterating on models, product managers moni ...
1 month ago