Site Reliability Engineer - San Francisco - Baseten

    Baseten
    Baseten San Francisco

    5 days ago

    Description

    About Baseten


    Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting‑edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we're scaling our team to meet accelerating customer demand.

    The Role


    As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents.

    We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten.

    Example Initiatives

    • Multi‑cloud capacity management
    • Inference on B200 GPUs
    • Multi‑node inference
    • Fractional H100 GPUs for efficient model serving

    Responsibilities

    • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
    • Establish standards and best practices for reliability and performance across the infrastructure.
    • Automate processes when relevant, particularly for managing CI/CD pipelines.
    • Own products and projects end‑to‑end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end‑to‑end execution.
    • Collaborate with cross‑functional teams to understand project requirements and translate them into technical solutions.
    • Mentor junior team members and contribute to knowledge sharing within the organization.
    • Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
    • Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.

    Requirements

    • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
    • 5+ years of professional work experience in a fast‑paced, high‑growth environment.
    • Extensive experience with Kubernetes.
    • Experience in building and maintaining scalable infrastructure.
    • Experience with infrastructure‑as‑code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, CircleCI, Jenkins).
    • Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, OpenTelemetry) is a plus.
    • Ability to own projects end‑to‑end, from project specification to execution.
    • No prior machine learning experience required, but should be open to learning about it.

    Benefits

    • Competitive compensation, including meaningful equity.
    • 100% coverage of medical, dental, and vision insurance for employee and dependents.
    • Generous PTO policy including company‑wide Winter Break (our offices are closed from Christmas Eve to New Year's Day).
    • Paid parental leave.
    • Company‑facilitated 401(k).
    • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

    Apply now to embark on a rewarding journey in shaping the future of AI If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward‑thinking team, we would love to hear from you.

    At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.


    #J-18808-Ljbffr

  • Work in company

    Reliability Engineer

    Only for registered members

    Cisco Silicon One ASICs are transforming the Future of the Internet. · Owning reliability test plans for new products. · Supporting High power Burn In, biased HAST and ESD/LU bring-up and debug for reliability qualification and evaluation. · ...

    San Francisco

    1 week ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are making sure that when businesses build AI agents the experience of doing so doesn't suck.Our team is a group of ex-athletes founders and builders with low egos and a high belief that life not about taking the easy road but challenging ourselves to find the most we can be. ...

    San Francisco $130,000 - $190,000 (USD)

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're building a software platform that empowers today's commercial contractors. Join our cloud infrastructure and reliability engineering team as a Site Reliability Engineer (SRE). Your primary responsibility will be to improve and protect the reliability, performance, and opera ...

    San Francisco, CA $115,000 - $185,000 (USD) per year

    2 days ago

  • Work in company

    Reliability/DFX Engineer

    Only for registered members

    We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · This hands-on individual contributor will sit within our hardware team and work closely with chip design, platform design, hardware health, and the broader industry ...

    San Francisco

    1 week ago

  • Work in company

    Reliability/DFX Engineer

    Only for registered members

    We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. · ...

    San Francisco, CA

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're a fully distributed team with employees across North American time zones. · We build the systems and practices that keep everything running smoothly—handling hundreds of millions of requests, · minimizing downtime, and continuously improving service performance.The Site Rel ...

    San Francisco $175,000 - $250,000 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're hiring an SRE to join our engineering team at Plenful. · You'll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. · ...

    San Francisco Full time

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. · Here, ...

    San Francisco $116,000 - $200,000 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are creating a new category of work where expertise powers AI advancement. · Ambitious team that works alongside researchers, operators, · and AI companies shaping systems redefining society.. · ...

    San Francisco

    1 month ago

  • Work in company

    Network Engineer, Reliability

    Only for registered members

    We are building the infrastructure for abundant intelligence at FluidstackWe partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta · , · Fluidstack seeks a Network Engineer to champion and build process reliability metrics f ...

    San Francisco $150,000 - $250,000 (USD)

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Senior Site Reliability Engineer (SRE) to build the reliability foundation for a mission-critical healthcare platform. · This is not a "keep the lights on" SRE role. You'll own reliability end-to-end, · define what good looks like: SLIs, SLOs, incident respon ...

    San Francisco

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're a team of doctors, engineers, designers, researchers and creatives building tools that help clinicians stay focused on what matters most: their patients. · In just 18 months Heidi has given back more than 18 million hours to healthcare professionals — supporting 73 million ...

    San Francisco $140,000 - $185,000 (USD) Full time

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a Site Reliability Engineer (SRE) with strong experience in Microsoft Azure cloud services and Java-based application development.This role blends software engineering and operations, with a focus on building reliable, · scalable,and highly available systems. · ...

    San Francisco

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    we are seeking an experienced site reliability engineer to join our platform engineering team in the bay area you ll be instrumental in ensuring the high availability performance and scalability of coderabbit s ai powered code review platform this role sits at the intersection of ...

    San Francisco

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a Site Reliability Engineer (SRE) with strong expertise in Identity and Access Management (IAM) and cloud platforms. · Design and implement IAM/IGA solutions using Okta (OAuth, SAML, OIDC, MFA, FIDO, Zero Trust). · Manage and configure Microsoft Entra ID (Azure AD) ...

    San Francisco

    6 days ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    We envision a future where every individual can enjoy the luxury of a comfortable home without contributing to carbon emissions. Our high-efficiency, low-carbon heat pump is just the beginning of our journey toward fully decarbonizing buildings. · Lead lifetime reliability strate ...

    San Francisco, CA USA

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Patreon powers creators to do what they love and get paid by the people who love what they do.We're continuing to invest heavily in building the best creator platform with the best team in the creator economy. · ...

    San Francisco

    1 month ago

  • Work in company

    Software Engineer, Reliability

    Only for registered members

    We are looking for experienced problem-solving engineers to ensure our systems scale. We seek to learn from deployment and distribute the benefits of AI while ensuring that this powerful tool is used responsibly and safely. · ...

    San Francisco, CA

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    About EngFlow · At EngFlow, we help developers save time by accelerating software builds and tests. Our cloud-based, distributed service optimizes developer workflows through remote execution and caching, improving efficiency, productivity, and product quality. · Backed by top in ...

    San Francisco $115,000 - $185,000 (USD) per year

    2 days ago

  • Work in company

    Software Engineer, Reliability

    Only for registered members

    + Reliability expert to maintain and enhance the stability and scalability of our rapidly evolving infrastructure. · + Design and implement solutions to ensure the scalability of our infrastructure. · + Build and maintain load, chaos and synthetic testing software. · Job summary: ...

    San Francisco

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Job summary · Our mission is to increase economic freedom in the world. It's a massive opportunity that demands the best of us every day · ,Responsibilities:Improve observability reliability and availability by defining and measuring key metrics · Build automation and improve sys ...

    San Francisco $186,065 - $218,900 (USD)

    3 weeks ago

Jobs
>
Site reliability engineer
>
Jobs for Site reliability engineer in San Francisco