Database Reliability Engineer - San Francisco, CA

Only for registered members San Francisco, CA, United States

1 day ago

Default job background
About WorkOS · WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We're a fully distributed team with employees across North American time zones. We're well-funded, having raised $100m ...
Job description

About WorkOS

WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We're a fully distributed team with employees across North American time zones. We're well-funded, having raised $100m in funding from top investors including Greenoaks Capital, Lachy Groom, and Lightspeed Ventures. Our fast-growing customer base includes rapidly growing SaaS companies like OpenAI, Cursor, Perplexity, Vercel, Plaid, and hundreds of others.

About the Infrastructure Team

The Infrastructure team ensures the WorkOS platform remains fast, reliable, and resilient at scale. We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, minimizing downtime, and continuously improving service performance. Our team works across the stack and collaborates closely with product engineering teams.

As a Database Reliability Engineer on this team, you'll bring specialized database expertise to the Infrastructure organization. You'll own the full lifecycle of database management, from design and capacity planning through performance optimization and disaster recovery, ensuring data durability and scalability as WorkOS grows.

The Role

As a Database Reliability Engineer, you'll be the expert our engineering teams turn to for everything database-related. You'll work across the stack to ensure our PostgreSQL infrastructure (and related data stores) can support WorkOS's growth, from query optimization to capacity planning to incident response. You'll combine the mindset of a software engineer with deep database administration expertise to build automation, improve observability, and make our data layer self-healing wherever possible.

What You'll Do

  • Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure.
  • Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery.
  • Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs.
  • Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation.
  • Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team.
  • Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes.
  • Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads.
  • Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer.
  • Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation.
  • Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.

About You

  • 5+ years of experience running PostgreSQL in production at scale, with strong knowledge of internals (WAL, MVCC, vacuum tuning, query planner, indexing, replication).
  • Solid software engineering skills. You write production-quality code, not just scripts. Experience with Python, Go, Ruby, or similar languages.
  • Experience with infrastructure-as-code and configuration management (Terraform, Ansible, Chef, or similar).
  • Strong SQL skills and the ability to review and optimize complex queries for high-throughput, low-latency environments.
  • Experience with database high-availability patterns: streaming replication, connection pooling (PgBouncer), failover automation (Patroni or similar).
  • Familiarity with cloud database services on AWS (RDS, Aurora, DynamoDB, ElastiCache) or equivalent platforms.
  • Experience with monitoring and observability tools (Datadog, Prometheus, Grafana, or similar) applied to database workloads.
  • Comfort with on-call responsibilities and a track record of effective incident response.
  • Strong written and verbal communication skills. You document your work and share context proactively.
  • A proactive, ownership-driven mindset. When you see something broken, you fix it. When you see a pattern of toil, you automate it.

Nice to Have

  • Experience with other data stores beyond PostgreSQL (Redis, DynamoDB, ClickHouse, Elasticsearch).
  • Familiarity with Ruby on Rails or Django and how ORMs interact with the database layer.
  • Experience with database migration tooling and blue-green or zero-downtime migration strategies.
  • Contributions to open-source database tooling or the PostgreSQL ecosystem.
  • Background in security-sensitive environments, particularly around data encryption, access controls, and compliance requirements.

Projects You Could Work On

  • Designing and implementing automated failover and self-healing for our PostgreSQL clusters.
  • Building a query performance analysis pipeline that surfaces slow queries and recommends index improvements before they become production issues.
  • Developing a database change management system that lets engineers safely run migrations with automated rollback capabilities.
  • Improving our disaster recovery posture by testing backup restoration, reducing recovery time objectives, and automating DR drills.
  • Creating capacity planning models that forecast database growth and trigger scaling actions proactively.
  • Building internal tooling and dashboards that give engineering teams visibility into their database usage patterns.
  • Optimizing our connection pooling and load balancing strategy across read replicas to improve throughput and reduce latency.

Benefits (US Only)

At WorkOS, we offer resources that emphasize personal and familial well-being. We offer healthcare coverage for you and your family, including medical, dental, and vision. We offer parental leave, paid-time off and fully remote working arrangements.

Benefits include:

  • Competitive pay
  • Substantial equity grants
  • Healthcare insurance (Medical, Dental and Vision) for you and your family
  • 401k matching
  • Wellness and fitness monthly allowances
  • PTO + paid holidays + unlimited sick leave
  • Autonomy and flexibility with remote work

Please inquire directly with our recruiting team for benefits available to those working outside the US.

Equal Opportunity Employer

WorkOS is an equal opportunity employer, committed to diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Compensation Range: $175K - $275K



Similar jobs

  • Work in company

    Reliability Engineer

    Only for registered members

    Cisco Silicon One ASICs are transforming the Future of the Internet. · Owning reliability test plans for new products. · Supporting High power Burn In, biased HAST and ESD/LU bring-up and debug for reliability qualification and evaluation. · ...

    San Francisco

    1 week ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are making sure that when businesses build AI agents the experience of doing so doesn't suck.Our team is a group of ex-athletes founders and builders with low egos and a high belief that life not about taking the easy road but challenging ourselves to find the most we can be. ...

    San Francisco $130,000 - $190,000 (USD)

    1 week ago

  • Work in company

    Software Engineer, Reliability

    Only for registered members

    We are looking for experienced problem-solving engineers to ensure our systems scale. We seek to learn from deployment and distribute the benefits of AI while ensuring that this powerful tool is used responsibly and safely. · ...

    San Francisco, CA

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're looking for engineers who are excited to improve the reliability of complex systems and enjoy digging into how things work. · Bring a generalist mindset and are comfortable working across infrastructure layers—from compute and networking to storage, databases, and app runti ...

    San Francisco $175,000 - $250,000 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. · Design and implement scalable infrastructure on Google Cloud Platform. · Own critical platform services. · ...

    San Francisco Full time

    1 month ago

  • Work in company

    Software Engineer, Reliability

    Only for registered members

    + Reliability expert to maintain and enhance the stability and scalability of our rapidly evolving infrastructure. · + Design and implement solutions to ensure the scalability of our infrastructure. · + Build and maintain load, chaos and synthetic testing software. · Job summary: ...

    San Francisco

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Senior Site Reliability Engineer (SRE) to build the reliability foundation for a mission-critical healthcare platform. · This is not a "keep the lights on" SRE role. You'll own reliability end-to-end, · define what good looks like: SLIs, SLOs, incident respon ...

    San Francisco

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're hiring an SRE to join our engineering team at Plenful. · You'll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. · ...

    San Francisco Full time

    1 month ago

  • Work in company

    ASIC Reliability Engineer

    Only for registered members

    Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of its direct clients, recruitment ad agencies, and marketing partners. · ...

    San Francisco, CA

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. · ...

    San Francisco, CA

    1 month ago

  • Work in company

    Network Engineer, Reliability

    Only for registered members

    We are seeking a Network Engineer to serve as a reliability engineer championing and building process data collections and reliability metrics with the objective of improving the quality and reliability of AI networks from deployment through operations. · li>Strong Operations Bac ...

    San Francisco $250,000 - $400,000 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Job summary · Our mission is to increase economic freedom in the world. It's a massive opportunity that demands the best of us every day · ,Responsibilities:Improve observability reliability and availability by defining and measuring key metrics · Build automation and improve sys ...

    San Francisco $186,065 - $218,900 (USD)

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Site Reliability Engineer to keep all user-facing services and production systems running smoothly. You will be responsible for participating in on-call rotation, building and running infrastructure with Ansible, Terraform, and Kubernetes, building monitoring ...

    San Francisco

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    About CodeRabbit · CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. · The Role · We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Ba ...

    San Francisco

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    we are seeking an experienced site reliability engineer to join our platform engineering team in the bay area you ll be instrumental in ensuring the high availability performance and scalability of coderabbit s ai powered code review platform this role sits at the intersection of ...

    San Francisco

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're a fully distributed team with employees across North American time zones. · We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, · minimizing downtime, and continuously improving service performance.The Site Rel ...

    San Francisco $175,000 - $250,000 (USD)

    1 month ago

  • Work in company

    Founding Reliability Engineer

    Only for registered members

    We're hiring our first engineer fully dedicated to the infrastructure foundation of Sieve. · This is a high-ownership role for someone who thinks deeply about: throughput and system stability monitoring and incident response security and least-privilege design reducing operationa ...

    San Francisco $150,000 - $300,000 (USD)

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Mercor is creating a new category of work where expertise powers AI advancement. · Own reliability and production safety for core shared services and customer-facing systems. · Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and pr ...

    San Francisco, CA

    1 month ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    We envision a future where every individual can enjoy the luxury of a comfortable home without contributing to carbon emissions. Our high-efficiency, low-carbon heat pump is just the beginning of our journey toward fully decarbonizing buildings. · Lead lifetime reliability strate ...

    San Francisco, CA USA

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Patreon powers creators to do what they love and get paid by the people who love what they do.We're continuing to invest heavily in building the best creator platform with the best team in the creator economy. · ...

    San Francisco

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Site Reliability Engineer (SRE) with strong experience in Microsoft Azure cloud services and Java-based application development.This role blends software engineering and operations, with a focus on building reliable, · scalable,and highly available systems. · ...

    San Francisco

    2 weeks ago