Bilingual LLM Evaluator
2 weeks ago

Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Evaluate LLM outputs for accuracy relevance bias and safety Design test cases and evaluation benchmarks for AI models Analyze model behavior and document findings Collaborate with ML engineers and data scientists to improve models Provide structured feedback to enhance model perf ...
3 weeks ago
We're looking for an LLM Evaluation, Benchmarking & Experimentation Engineer to rigorously test our proprietary LLM API and build the infrastructure for systematic model improvement. · ...
2 weeks ago
· At LeoTech, we are passionate about building software that solves real-world problems in the Public Safety sector. Our software has been used to help the fight against continuing criminal enterprises, drug trafficking organizations, identifying financial fraud, disrupting sex ...
2 days ago
Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey . · ...
1 month ago
Mercor connects elite creative and technical talent with leading AI research labs. · Evaluate LLM-generated responses on their ability to effectively answer user queries. · ...
2 weeks ago
+Job summary · Evaluate LLM-generated responses on their ability to effectively answer user queries. · QualificationsBachelor's degree · Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in Hindi · ...
1 month ago
We are passionate about building software that solves real-world problems in the Public Safety sector. · ...
1 month ago
Build the evaluation layer in the ThirdLaw platform—the part of the system that decides whether an LLM prompt, response, tool call, or agent behavior is acceptable. · ...
1 week ago
We're partnering with a deep-tech AI company building autonomous agentic systems for complex physical and real-world environments. The team operates at the edge of what's possible today designing AI systems that plan act recover and improve over long horizons in high-stakes setti ...
1 month ago
Tetrad Digital Integrity (TDI) is a leading-edge cybersecurity firm with a mission to safeguard and protect our customers from increasing threats and vulnerabilities in this digital age. · We are seeking a highly skilled LLM Security Evaluation Expert to join our team. In this ro ...
2 days ago
Our Client is a well-funded nonprofit research organization focused on measuring frontier AI capabilities—especially agentic / autonomous capabilities and the ability of models to conduct AI R&D because those capabilities can create outsized societal and security risk if they sca ...
1 month ago
We are looking for an LLM-GenAI Model Evaluator position. · ...
2 weeks ago
We are seeking an LLM-GenAI Model Evaluator position for our team in Austin tx & Sunnyvale. The ideal candidate will have strong understanding of LLMs, generative AI, and transformer-based architectures. · Strong understanding of LLMs, generative AI, and transformer-based archite ...
2 weeks ago
You'd be building the UI that turns messy LLM evaluation outputs into clear, explorable artifacts that researchers can trust. · ...
1 month ago
Immediate need for a talented LLM-GenAI Model Evaluator. · Evaluate LLM-GenAI models. · ...
2 weeks ago
Mercor connects elite creative and technical talent with leading AI research labs. The company is headquartered in San Francisco and has investors including Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. As a Bilingual LLM Evaluation Anal ...
1 month ago
SilverEdge Government Solutions is seeking a highly skilled LLM Security Evaluation Expert to join our team. · ...
1 month ago
We're partnering with a deep-tech AI company building autonomous agentic systems for complex physical environments. · Build eval harnesses for agentic LLM systems offline + in-workflow ...
1 month ago
SilverEdge Government Solutions is seeking a highly skilled LLM Security Evaluation Expert. In this role, you will be responsible for rigorously testing the security and integrity of Large Language Models (LLMs). · TS/SCI with Polygraph level Clearance · Familiarity with prominen ...
4 weeks ago
I'm seeking a technical mentor to help deepen my understanding of LLM evaluation and benchmarking, with particular attention to high-stakes applications · (e.g., mental health), while developing a generalizable framework for reasoning about model performance across domains. · ...
1 month ago