Jobs
>
San Francisco

    Site Reliability Engineer - San Francisco, United States - DigitalOcean

    Default job background
    Description

    Do you ever wonder what happens inside the cloud?


    DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

    We want people who are passionate about building the systems, culture, and processes that will improve the resiliency, reliability, scaling, and performance for cloud services.


    We are looking for an experienced Site Reliability Engineer to work closely with our product engineering and infrastructure teams. The Site Reliability Engineer will be performing a mix of hands-on development, consulting, and collaborating with other teams and stakeholders to help bring DigitalOcean's engineering systems and culture to the next level.

    In this role, you'll have the opportunity to drive reliability and availability across the organization at a global scale. As a champion of reliability and availability for the customer, you'll drive maturity across microservices and across the company, helping engineering teams set effective SLIs/SLOs, cultivate a blameless and diligent postmortem culture, and strategize for disaster recovery. Risk can't be avoided - but it can be managed and mitigated, and that's what the embedded SREs at DigitalOcean are here to do.

    What You'll Be Doing:

    • Building automation, metrics collections (Prometheus a plus), tooling, and occasionally features to directly improve the reliability, resiliency, and scaling of our IaaS,SaaS and PaaS product offerings and architecture.
    • Collaborate with teams on researching and fine-tuning their SLIs/SLOs to drive the best outcomes for customers.
    • Facilitate DigitalOcean's culture of learning by providing insight and recommendations for improvement.
    • Coaching teams and individuals on reliability best practices and solutions.
    • Working with peer SREs and engineering leaders to define the architectures and practices that should be adopted in order to deliver on our engineering and operational goals.
    • Establishing best practices for development, architecture, deployment, and operations.
    • Working with peer SREs to improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization.
    • You will be working with the Paperspace and Digital Ocean teams that will be extending our platform for AI applications and ML developers.

    What We'll Expect From You:

    • Software engineering experience. At DO, SRE is not an operations role. Ideal candidates will have previous experience developing software at scale. Golang experience is a plus.
    • We're hiring a variety of levels for this role. For senior-level roles, we'll expect a distinguished track record as SRE (or a similar role) with hands-on experience implementing reliability, process, and scaling solutions.
    • Flexibility to get up to speed with a variety of diverse product-focused teams.
    • History of fostering positive relationships with stakeholders and a track record of successful collaboration and coaching.
    • Clear communication skills (both written and verbal) to document processes and architectures.
    • Experience implementing disaster recovery best practices.
    • Developing robust solutions that facilitate streamlined resolution of customer inquiries through the use of technologies for automation, deflection, and issue management.
    • Golang or Python with a broad understanding of the full technology stack for a modern infrastructure. You will be working in both Golang and/or Python heavily in this role.
    • An advocate of effective development environments with the use of CI/CD tooling and configuration management technologies such as Chef or Ansible.
    • Experience deploying and maintaining your code in a containerized environment.
    • Experience with using and plugging into observability frameworks.

    Why You'll Like Working for DigitalOcean:

    • We reward our employees. The salary range for this position is between $175, $200,000.00 based on relevant years of experience and skills. Employees may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees including grants of equity upon hire and the option to participate in our Employee Stock Purchase Program.
    • We value development. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that is always challenging our teams and employees to continuously grow. We maintain a growth mindset in everything we do and invest deeply in employee development through formalized mentorship and other internal programs. We provide all employees with reimbursement for relevant conferences, training, and education.
    • We care about your well-being. In addition to cash and equity compensation, we also offer employees a competitive array of benefits. While the philosophy around our benefits is the same worldwide, specific benefits may vary in other countries due to local regulations and preferences.
    • We value diversity and inclusivity. We are an equal-opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

    #LI-Remote



  • BHO Tech San Francisco, United States Full time

    Job Description · We're the driverless car company. We're building the world's best autonomous vehicles to safely connect people to the places, things, and experiences they care about. · Our vehicles are on the road in California, Arizona, and Michigan navigating some of the mos ...


  • Best Secret San Francisco, United States

    About BestSecretGroup · We are a leading European members-only online destination for premium and luxury off-price fashion. Partnering with over 3,000 international brands, our tech-focused mindset and strong commitment to sustainability drives a truly unique experience for our m ...


  • Instabase San Francisco, United States

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. · With customers representing some of the largest and most complex organizations in the ...


  • PicnicHealth San Francisco, United States

    [Full Time] Site Reliability Engineer at PicnicHealth (United States) | BEAMSTART Jobs · Site Reliability Engineer · PicnicHealth United States · Date Posted · 10 Aug, 2023 · Work Location · San Francisco, United States · Salary Offered · $160 — $190 yearly · Job Type · Full Ti ...


  • Syndio San Francisco, United States

    Do you want to empower organizations to fairly and equitably hire, promote, retain and compensate their employees? Syndio is a Series-C technology company committed to fairness in the workplace. Fueled by investments of $83M from Bessemer Ventures, Voyager Capital and social chan ...


  • Wasmer San Francisco, United States

    [Full Time] Site Reliability Engineer at Wasmer (United States) | BEAMSTART Jobs · Site Reliability Engineer · Wasmer United States · Date Posted · 25 Mar, 2023 · Work Location · San Francisco, United States · Salary Offered · Not Specified · Job Type · Full Time · Experience R ...


  • OpenAI San Francisco, United States

    Join the engineering teams that bring OpenAI's ideas safely to the world · The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits ...


  • Together AI San Francisco, United States

    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, a ...


  • Replit San Francisco, United States

    [Full Time] Site Reliability Engineer at Replit (United States) | BEAMSTART Jobs · Site Reliability Engineer · Replit United States · Date Posted · 23 Feb, 2023 · Work Location · San Francisco, United States · Salary Offered · $70000 — $175000 yearly · Job Type · Full Time · Ex ...


  • Withorb San Francisco, United States

    Mission · Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage—whether that's through seats, consumption, feature limits, or usage-bas ...


  • Radar San Francisco, United States

    [Full Time] Site Reliability Engineer at RADAR (United States) | BEAMSTART Jobs · Site Reliability Engineer · RADAR United States · Date Posted · 14 Mar, 2023 · Work Location · San Francisco, United States · Salary Offered · $100000 — $230000 yearly · Job Type · Full Time · Exp ...


  • Mission Box Solutions San Francisco, United States Permanent

    As a Site Reliability Engineer (SRE), you will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the federal government. Our client firmly believes that exceptional technology services ...


  • Cypress HCM San Francisco, United States

    Job Description · Job Description · Site Reliability Engineer (Grafana)Responsibilities:Collaborate with Service Owners and Observability Leaders to develop a strategy for monitoring the technology stack using Grafana. · Initiate data ingestion by deploying Telegraf and exporters ...


  • Swish Analytics San Francisco, United States

    Swish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking f ...


  • CAPTIVATEIQ INC San Francisco, United States

    [Full Time] Site Reliability Engineer - Remote at CaptivateIQ (United States) | BEAMSTART Jobs · Site Reliability Engineer - Remote · CaptivateIQ United States · Date Posted · 31 Jan, 2023 · Work Location · San Francisco, United States · Salary Offered · $139000 — $186000 yearl ...


  • Swish Analytics San Francisco, United States

    Swish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking f ...


  • Orb San Francisco, United States

    Mission · Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage—whether that's through seats, consumption, feature limits, or usage-bas ...


  • Telestream San Francisco, United States

    Job Description · Job Description · About Us: · Welcome to the forefront of innovation at Telestream, an industry leading digital video delivery company. We are a dynamic and forward-thinking organization committed to leveraging cutting-edge cloud technologies to drive our suc ...


  • Resource Informatics Group San Francisco, United States

    Job Title: Site Reliability Engineer · Work Location: San Francisco, CA (Hybrid after showing successful engagement) · Duration: 18+ months · Most important skills:10 years of Oracle database administration experience on large production environment · Database hands on skills ...


  • GRNET San Francisco, United States

    About GRNETGRNET - National Infrastructures for Research and Technology, is an entity of the Greek Government, operating under the Ministry of Digital Governance. It provides advanced network and cloud computing services to academic and research institutions, educational entities ...