Founding Engineer - San Francisco
3 days ago

Job description
Our Company
Pax Historia is defining a new category of gameplay using the latest advancements in generative AI. Our platform brings together the depth of grand strategy with the creative freedom of a sandbox experience, all fueled by a passionate community that creates and remixes scenarios on our platform.
Our community publishes hundreds of scenarios per day, plays millions of rounds a week, and is growing quickly. In addition, we're backed by Y Combinator, Pace Capital, and Z Fellows. Your work will immediately ship to a product used by hundreds of thousands of players.
The Role
We're hiring a founding-level ML systems engineer to work in-person full-time in San Francisco (in Dogpatch). You will report directly to the cofounders.
Our current position:
The latest closed source models play our game with reasonably good quality, but they're incredibly expensive.
Open source models are much more affordable but almost never selected by users as their performance on our platform is poor.
Prompts and harnesses are largely identical between models.
A working internal eval system (with vast rooms for improvement)
What you'll do:
Build and run the infrastructure needed to rigorously tailor harnesses and prompts to each AI model individually to squeeze out maximum performance.
Train domain-specific models to close or even eliminate the gap between open and closed models in their weight class at playing Pax.
Reduce costs associated with closed source models by optimizing caching strategies.
Further improve performance of closed source models by training tuned endpoints.
Evaluate and improve embedding and reranker performance in the places we use them.
Enable entirely new user experiences based on upcoming world models.
TLDR: Your work will directly make the game more affordable and more fun
Resources you'll have:
Trillions of tokens of prompt and response logs from millions of gameplay trajectories.
Tens of thousands of user preference votes per day (coming soon, pairing algo ideas described here)
Generous access to compute (6 figure budget now, with a pathway to 7 if results are promising)
Points of contacts with many of the teams pushing the envelope of inference at scale (Chutes, OpenRouter, CanopyWave, and more)
How performance will be measured:
While we understand results may take months to start seeing, your north star metric will be to improve user-preference win-rates over off-the-shelf options with the same inference budget.
This is an intensive role and you should expect to work around 50-60 hours per week for the first few months; after that, hours may begin to decrease. There is potential for slight flexibility (ie, 1 day/week hybrid) but we have a strong preference for candidates who can commit to in person work.
Culture
Most of our players have discovered Pax Historia organically (friends, youtube reviews) and have stuck around because we truly care about the game we're building. That's why we want every one of our employees to care deeply about our product too. History, fantasy, or sci-fi nerds are especially welcome, but if you can articulate why you'd be excited to work on our game, we'd love to hear from you.
Pax Historia is still an extremely small company, so you'll be working directly with the cofounders and a few other employees. You should be self-driven, a team player, and willing to advocate for your ideas. The cofounders will not be hand-holding: their leadership strategy is to 'get out of the way' of employees to let them do their best work.
Finally, flexibility is also very important. Since we are scaling very rapidly and still working with a small team, you should come to work willing to help solve a variety of problems on the fly.
Your Qualifications
Core Competencies:
You have shipped ML systems to real users and operated them in production.
You have made explicit cost/quality tradeoffs in deployed systems.
You have debugged and fixed unexpected model failures in production (e.g. expert hot-spots, structured output errors, etc).
You have designed, critiqued, or iterated on evaluation frameworks and understand their failure modes.
Product & Ownership Mindset
You bias toward leverage and compounding improvements (better evals, better feedback loops, better infrastructure).
You are willing to work on the "boring" but important problems like instrumentation, data hygiene, debugging, and reliability.
You take ownership of problems and are comfortable advocating for your ideas (while remaining open to evidence).
You know when to say "no" to yourself and us when something isn't worth the complexity or risk.
Nice to Have
Experience with preference modeling, pairwise ranking, or human-in-the-loop evaluation systems.
Background in games, simulations, storytelling systems, or other domains where qualitative judgment matters.
Experience operating systems at high request volume.
Prior work at an early-stage startup or as a founding engineer.
What We Don't Require
A specific degree, academic pedigree, or publication record.
Prior game industry experience.
Perfect knowledge of every technique listed above.
Compensation
Salary range is from $150,000 to $240,000 depending on your relevant qualifications and experience. For truly exceptional fits (senior/staff-level), we may be willing to go above the posted range.
Selected candidates can expect to receive 0.25% - 1+% equity. Vesting schedule is a 12 month cliff and 4 year monthly vesting. We will also be offering a non-matching 401k plan.
This job listing is for a W-2 employee opportunity. We are unfortunately unable to sponsor visas (other than O1) at this time. Pax Historia is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, gender identity or expression, sexual orientation, national origin, age, disability, or veteran status.
Similar jobs
We are representing a venture-backed, early-stage startup that is redefining the intersection of Artificial Intelligence and Hardware Engineering. · ...
4 days ago
We're building a search engine for human thought. · While you might go to Google to search through the web, you come to Terac to search through human opinion. Consumer research teams at DoorDash, Notion, and Posh use Terac to test features and products before launch. Our AI-nativ ...
1 week ago
We are seeking a Founding Engineer to play a pivotal role in the development and enhancement of our AI-driven strategic finance platform.This is an opportunity to contribute to the foundational stages of building a transformative technology platform. · ...
1 month ago
We're building AI to power the world's leaders — from Senators to CEOs. · 4+ years of experience shipping full-stack, production-grade products · Experience at an early-stage startup – shipping daily and iterating quickly · ...
2 weeks ago
We're rebuilding intimacy at scale with our product performing deep research on social followers, enabling brands to authentically engage through personalized DMs. · ...
1 week ago
We are transforming how the $2T construction industry plans and builds. · Muro AI is building AI agents that automate the most complex phase of construction: preconstruction. · You will shape how the next century of construction gets built. · ...
1 month ago
Building modern financial infrastructure with software that integrates directly into tools founders already use. · Build and ship core product features end to end. · Work directly with founders and early customers to translate real business pain points into product solutions. · ...
4 days ago
We are accelerating the path to scientific discoveries by automating literature reviews. We are backed by Y Combinator and leaders at companies like OpenAI. · We want to create a team that builds a research platform for labs in San Francisco using TypeScript, Vite, Python and Kub ...
1 month ago
We are representing a venture-backed, early-stage startup that is redefining the intersection of Artificial Intelligence and Hardware Engineering. · Architect Agent Workflows: · Master Complex Data: · Direct-to-Production: · ...
3 days ago
We're partnered with a Seed-stage startup who are building the next generation of AI Agents for real-world operations - starting with manufacturing and logistics, where software actually moves atoms, not just pixels. · This is not a 9–5 role. This is for people who want to build ...
1 month ago
We're building a search engine for human thought. Consumer research teams at DoorDash, Notion, and Posh use Terac to test features and products before launch. · ...
1 month ago
Founding Engineer (Core Infrastructure) · Stealth AI Startup | $5.1M Raised · My client is a venture-backed AI company emerging from stealth, backed by investors behind Crunchbase and Republic, along with firms like Draper Associates and Precursor Ventures. · They're building aut ...
3 hours ago
+Architect secure, SOC 2 compliant data pipelines to aggregate and structure complex financial documents. · +Develop AI agents and simulation engines that model downstream impacts of major financial decisions. · +Take full ownership of the engineering lifecycle, including design, ...
2 weeks ago
We're building the hybrid law firm for startups combining AI-powered technology with experienced attorney oversight to deliver faster more affordable and transparent legal services. · The OpportunityWe're looking for a Founding Engineer / Head of Engineering to own the technical ...
3 weeks ago
We're a small team building an agentic platform for software problems inside the Global 2000. · Our product blends program analysis, developer tooling and code generation to turn enterprise transformations into something teams can do in a fraction of the time. · ...
1 week ago
They are hiring our Founding Engineer (Full-Stack) to work in-person, 5 days/week, in San Francisco as a member of the founding team. · ...
1 week ago
We're a YC-backed startup on a mission to bridge the gap between AI and the $5T knowledge work economy through cutting-edge data infrastructure and environments. · You'll play a pivotal role in building the data and evaluation infrastructure that powers frontier models. · ...
1 week ago
We're redefining how banks and insurers operate by deploying AI agents that handle complex, regulated workflows with human-level reasoning and communication. · The Role We're looking for a Founding Engineer in the US to join our core team and help build the next generation of aut ...
4 days ago
The Founding Engineer will move away from maintenance work and incremental updates to focus on 0-to-1 builds,rapid prototyping,and shipping the MVP that defines the company. · 5+ years of Full Stack engineering experience. · ...
1 month ago
We are accelerating the path to scientific discoveries by automating literature reviews for scientists. · We are backed by Y Combinator and leaders at companies like OpenAI. · About The Role · You'll build our research platform alongside our founding team. · ...
1 month ago
We are accelerating the path to scientific discoveries by automating literature reviews.We want to create a small but extremely ownership-driven team with a founding engineer working directly with the founding team. · ...
1 month ago