Sr. Staff Site Reliability Engineer - Seattle - Coupang

    Coupang
    Coupang Seattle

    1 hour ago

    Description
    Job Overview:
    Site Reliability Engineers (SREs) at Coupang is a mission-critical role which combines software and system engineering to build, run and scale our complex, large-scale ecommerce systems. As part of the Site Reliability Engineering team, you will be responsible for ensuring all our customer facing services are healthy, monitored, automated, and designed to scale. As SRE organization we take pride in handling "operations as an engineering" problem with automation first approach. You will use your background to build best in class infrastructure automation for areas such as Observability, Incident management, Disaster Recovery, Load testing, Capacity engineering and many more. In this role you will work very closely with our product development teams from an early stage of design to all the way helping resolve any production incidents, maintaining SLI/SLA bar for production services and influencing them with SRE principles and best practices. If you take pride in complete ownership, have a passion for solving complex technical challenges for large scale distributed systems and demeanor to work and communicate effectively across team boundaries, this is the role for you
    Key Responsibilities:
    • Serve as a primary point responsible for the platform reliability, health, and performance of all Coupang customer-facing services.
    • Gain deep knowledge of Coupang application workflow and dependencies.
    • Define and track key performance indicators (KPIs) and service-level objectives (SLOs) related to system availability, performance, and reliability.
    • Build world class incident management process and automation, including fast incident remediation, incident operational reviews and retrospectives.
    • Develop and implement best practices for creating , Scaling and maintaining effective monitoring, alerting, and telemetry systems.
    • Build automation to execute regular Disaster Recovery testing, Chaos testing and load testing to stay ahead of expected growth of Coupang services.
    • Work closely with product development teams to ensure the products are designed with scale and operability in mind.
    • Build right guardrails and automation for deploying production changes holding the reliability bar.
    • Participate in a 24x7 rotation for production issue escalations, functions well in a fast-paced environment.
    • Communicate effectively with people at all levels of the organization.
    Basic Qualifications:
    • Bachelor's degree in computer science, Engineering, or a related technical field.
    • 8+ years of industry experience building and operating large scale distributed systems
    Preferred Qualifications:
    • Prior experience working with AI/ML, large scale web-based Java architectures and JVM configuration.
    • Professional certifications in cloud platforms, monitoring tools, or related technologies.
    • Previous experience working on a large-scale GPU/Cloud Infrastructure platforms.
    • SLO/SLA management and implementation experience
    • Deep UNIX/Linux systems knowledge and administration background.
    • Demonstrated programming skills in one or more of: Python, Java, Golang, Ruby.
    • Strong problem-solving and analytical skills spanning systems, network (TCP/IP) and code, with a focus on data-driven decision-making.
    • Experience with cloud-based GPU infrastructure, including AWS, Azure, or Google Cloud Platform.
    • Strong understanding of DevOps and SRE practices, including continuous integration, continuous delivery, and infrastructure as code (IaC).
    • Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
    • Excellent communication and collaboration skills, with the ability to work with teams across distinct functions and technical domains.
    • Knowledge of open telemetry observability ecosystem including metrics, logging, tracing and tools, such as Prometheus, Grafana, Elastic Stack, Datadog, or New Relic.
    Pay & Benefits
    Our compensation reflects the cost of labor across several US geographic markets. At Coupang, your base pay is one part of your total compensation.
    The base pay for this position ranges from $176,000/year in our lowest geographic market to $221,000/yearin our highest geographic market. Pay is based on several factors including market location and may vary depending on job-related knowledge, skills, and experience.
    General Description of All Benefits
    • Medical/Dental/Vision/Life, AD&D insurance
    • Flexible Spending Accounts (FSA) & Health Savings Account (HSA)
    • Long-term/Short-term Disability
    • Employee Assistance Program (EAP) program
    • 401K Plan with Company Match
    • 18-21 days of the Paid Time Off (PTO) a year based on the tenure
    • 12 Public Holidays
    • Paid Parental leave
    • Pre-tax commuter benefits
    • MTV - [Free] Electric Car Charging Station
    General Description of Other Compensation
    "Other Compensation" includes, but is not limited to, bonuses, equity, or other forms of compensation that would be offered to the hired applicant in addition to their established salary range or wage scale.
    Coupang is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to actual or perceived race (including traits historically associated with race, including but not limited to hair texture and protective hair styles), color, religion, religious creed (including religious dress and grooming practices), sex or gender (including pregnancy, childbirth, breastfeeding, and medical conditions related to pregnancy, childbirth or breastfeeding), gender identity, gender expression, sexual orientation, ,ancestry, national origin (including language use restrictions), age (40 and over), physical or mental disability, medical condition, genetic information, HIV/AIDS or Hepatitis C status, family status (including but not limited to marital or domestic partnership status), military or veteran status, use of a trained dog guide or service animal, political activities or affiliations, ancestry, citizenship, family and medical leave status, status as a victim of any violent crime, or any other characteristic or class protected by the laws or regulations in the locations where we operate. Coupang is also committed to providing a safe work environment for its employees and its consumers. If you need assistance and/or a reasonable accommodation in the application of recruiting process due to a disability, please contact us at
    Requisition: R0065794
    Equal Opportunities for All

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Senior Site Reliability Engineer with strong background and experience in Azure Cloud, Terraform, Azure DevOps. · ...

    Seattle

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    This role is the team facing, consultative side of observability. The senior engineer partners directly with internal engineering teams to understand their systems, · pain points and reliability gaps. · Design and implement tailored observability solutions across tools like Grafa ...

    Seattle

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. · You will help building next generation search infrastructure and platfo ...

    Seattle $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    This role is the team facing, consultative side of observability. The senior engineer partners directly with internal engineering teams to understand their systems, pain points, and reliability gaps. · Build dashboards in Grafana for internal teams and leadership. · Maintain obse ...

    Seattle

    1 month ago

  • Work in company

    Network Reliability Engineer

    Only for registered members

    We're searching for an expert Network Reliability Engineer to architect, optimize, and operate the high-performance network fabrics that power large-scale AI and HPC workloads. · This isn't a traditional networking role — it's an opportunity to shape the performance backbone of s ...

    Seattle

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, · and automation for managing distributed systems in production environments. · Bachelor's Degree in Computer Science, · ,,an engineering-related fie ...

    Seattle $139,500 - $258,100 (USD)

    1 month ago

  • Work in company

    Network Reliability Engineer

    Only for registered members

    We're searching for an expert Network Reliability Engineer to architect, optimize, and operate the high-performance network fabrics that power large-scale AI and HPC workloads. · This isn't a traditional networking role — it's an opportunity to shape the performance backbone of s ...

    Seattle, WA

    2 weeks ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    +Tin Can is building a safer, simpler way for kids to connect — without smartphones. · +Own production reliability, including incident response, root-cause analysis, and follow-throughImprove observability (metrics, logging, alerting) so issues are easier to detect and diagnoseSt ...

    Seattle Full time

    1 month ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    We're building a safer, simpler way for kids to connect — without smartphones. We're creating screen-free, delightful devices and services that let families call the people who matter most, free from the noise of today's digital world. · We're building a bold, authentic, nostalgi ...

    Seattle $170,000 - $200,000 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're looking for Site Reliability Engineers to join our AML team. · Design, build and maintain highly available systems. · Monitor system performance and resolve issues. · ...

    Seattle $129,960 - $246,240 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Responsibilities · Site Reliability Engineering(SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you'll have the opportunity to manage the complex challenges of scale, while usin ...

    Seattle $129,960 - $246,240 (USD)

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    DAT is looking for a Senior Site Reliability Engineer to join our SRE platform team. · ...

    Seattle, WA

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a passionate and technically skilled Site Reliability Engineer (SRE) to join our team. You will play a critical role in building and maintaining highly available, high-performance model service platform. · We aim to empower numerous enterprise clients to accelerate ...

    Seattle $133,200 - $219,600 (USD)

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Responsibilities · Team introduction: · Site Reliability Engineering (SRE) of the AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run massively distributed AI/ML systems like recommendation or LLM in the United States ...

    Seattle $129,960 - $246,240 (USD)

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    · Company Description · Comtech is a woman-owned small business founded in 1998 and headquartered in Reston, VA. We offer IT solutions across the disciplines of program/project management, applications development, infrastructure, Cyber security, and enterprise content/data mana ...

    Seattle, WA $115,000 - $185,000 (USD) per year

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Responsibilities · TikTok video system is a world-leading video platform that provides multimedia storage, delivery, transcoding services. As part of the USDS, the Video Platform team is responsible for building the next generation video processing platform which provides excelle ...

    Seattle $112,725 - $177,840 (USD)

    3 days ago

  • Work in company

    Staff Engineer – Reliability Engineering

    Only for registered members

    We offer a rewarding career where your ambitions are met with endless possibilities. · ...

    Seattle, WA

    1 month ago

  • Work in company

    Staff Engineer – Reliability Engineering

    Only for registered members

    At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities. · Every day we honor our iconic brand by offering quality coverage to millions of customers and being there when they need us most. We thrive through relentless innovation to exceed our ...

    Seattle, WA

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Responsibilities · Team Intro · TikTok video system is a world-leading video platform that provides multimedia storage, delivery, transcoding services. As part of the USDS, the Video Platform team is responsible for building the next generation video processing platform which pro ...

    Seattle $129,960 - $246,240 (USD)

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Responsibilities · Team Introduction · Video Cloud Infra team, facing business experience and cost, builds a competitive video transmission network and multimedia processing platform, builds data foundation and analysis capabilities, drives product refined operation, reduces cost ...

    Seattle $129,960 - $246,240 (USD)

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Location: · Seattle · Team: · Technology · Employment Type: · Regular · Job Code: · A47835 · Responsibilities · Team Introduction · Video Cloud Infra team, facing business experience and cost, builds a competitive video transmission network and multimedia processing platform, bui ...

    Seattle, WA $115,000 - $185,000 (USD) per year

    2 days ago

Jobs
>
Seattle