Jobs
>
San Diego

    Senior Site Reliability Engineer - San Diego, United States - SHEIN Technology LLC

    Default job background
    Full time
    Description

    Job Title: Senior Site Reliability Engineer I

    Reports to: Senior Manager of Site Reliability Engineering

    Job Location: San Diego, CA

    Job Status: Exempt, FT

    About SHEIN

    SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.

    Position Summary

    We are looking for a Senior Site Reliability Engineer - Big Data (Official Title: Senior Site Reliability Engineer I) for our San Diego, CA-based office hub. Site Reliability Engineers work with the Technical Operations team at SHEIN and are hybrid software/systems engineers, whose overarching goal is to ensure that Production Services are "Always On." They strive to build the most reliable and performant systems on the planet.

    SREs work closely cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data, so we know exactly what happens across the ecosystem and can see problems before they occur and address them as quickly as possible.

    They are also responsible for improving Operational Efficiency, Utilization and System Resiliency of the Platform. They own Critical Open-Source Software that our platform relies on and are core participants in every significant engineering effort underway in the platform.

    They are also tasked with driving forward the operability of the platform to drive down the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise, and a strong desire to be challenged by problems of scale and complexity to make our service better for our customers.

    Job Responsibilities

    • Participate in an on-call rotation to ensure 24/7/365 availability of SHEIN's production system
    • Supervise capacity & utilization and work closely with cross-functional teams to orchestrate scale-up/down of the services
    • Own & operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, Redis
    • Build tools and design processes that help improve observability and system resiliency of the platform
    • Triage Site Availability Incidents and proactively work towards reducing MTTR for customer impacting incidents
    • Partner with Service owners to implement Service Level Metrics & Service Level Objectives that act as service level health indicators
    • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services
    • Develop and maintain technical documentation, network diagrams, runbooks, and procedures
    • Driving initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices
    • Responding to production incidents and using your experience in software development, systems engineering, and networking to proactively prevent repeatable issues
    • Provide relief and sustainable resolution to issues within our infrastructure
    • Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
    • Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
    • Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.

    Job Requirements

    • Bachelor's degree in Computer Science, Information Systems, or equivalent technical discipline is preferred
    • Experience with Big Data related component operation and maintenance, including Hadoop, Yarn, HBase, Hive, Spark, etc., is highly preferred
    • Experience with OSS technologies, like Elasticsearch, Kafka, and Redis, is highly preferred
    • Solid understanding of Linux system is preferred
    • Minimum 3 years working experience in an enterprise 24/7 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments is preferred
    • Systematic problem-solving approach, combined with a sense of ownership and drive
    • Full-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integration/build systems, Java, SQL and NoSQL databases
    • Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability
    • Strong experience with observability tools such as Grafana, Prometheus, Zabbix etc
    • Good experience in any of the scripting/programming languages: Python, GoLang etc
    • Familiar with container technology, such as: Docker, Kubernetes, Mesos, etc.
    • Understanding and experience with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions
    • Good verbal and written communication skills, and be able to work effectively with geographically remote teams

    Pay

    $107,600.00 min - $180,200.00 max annually, Bonus & RSU offered.

    Benefits and Perks

    Healthcare (medical, dental, vision, prescription drugs)

    Health Savings Account with Employer Funding

    Flexible Spending Accounts (Healthcare and Dependent care)

    Company-Paid Basic Life/AD&D insurance

    Company-Paid Short-Term and Long-Term Disability

    Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)

    Employee Assistance Program

    Business Travel Accident Insurance

    401(k) Savings Plan with discretionary company match and access to a financial advisor

    Vacation, paid holidays, floating holiday and sick days

    Employee discounts

    Free weekly catered lunch

    Dog-friendly office (available at select locations)

    Free gym access (available at select locations)

    Free swag giveaways

    Annual Holiday Party

    Invitations to pop-ups and other company events

    Complimentary daily office snacks and beverages

    SHEIN Technology LLC is an equal opportunity employer committed to a diverse workplace environment.



  • ACL Digital San Diego, United States

    Job Description: · Top 5 Required Skills: · 1. Product Reliability Test knowledge (HTOL, ESD, Latch-up, MSL, Temperature cycle, bias HAST, HTS) · 2. HTOL hardware design and script bring up- verification. · 3. PMIC ESD and latch-up setup experience · 4. Burn-in oven (MCC, INCAL e ...


  • Spectraforce Technologies San Diego, United States

    Job Title: Reliability Engineer · Duration: 6 Months · Location: Onsite(San Diego, CA) · Job Overview - Principal Duties and Responsibilities · Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarizati ...


  • ACL Digital San Diego, United States

    Job Description: · Read all the information about this opportunity carefully, then use the application button below to send your CV and application. · Top 5 Required Skills: · 1. Product Reliability Test knowledge (HTOL, ESD, Latch-up, MSL, Temperature cycle, bias HAST, HTS) ...


  • ACL Digital San Diego, United States

    Job Description: · Top 5 Required Skills : · 1. Product Reliability Test knowledge (HTOL, ESD, Latch-up, MSL, Temperature cycle, bias HAST, HTS) · 2. HTOL hardware design and script bring up- verification. · 3. PMIC ESD and latch-up setup experience · 4. Burn-in oven (MCC, INCAL ...


  • Spectraforce Technologies San Diego, United States

    Duration: 6 Months · Location: Onsite(San Diego, CA) · Job Overview - Principal Duties and Responsibilities · Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and summarization on a regular basis for commer ...


  • Manpower Group Inc. San Diego, United States

    Product Reliability Test knowledge (HTOL, ESD, Latch-up, MSL, Temperature cycle, bias HAST, HTS). · HTOL hardware design and script bring up- verification. · PMIC ESD and latch-up setup experience ,Burn-in oven (MCC, INCAL experience). · Fab and assembly process knowledge. · ...


  • ACL Digital San Diego, United States

    Job Title: Reliability Development Engineer · Location: San Diego, CA (Onsite) · Duration: Contract Project · Job Description: · Job Overview - Principal Duties and Responsibilities: · Successful candidate will be tasked for Product, Package reliability test tracking; reliabilit ...


  • Peak Demand Inc San Diego, United States

    We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technology that is being deployed with custom ...


  • PEAK Technical Staffing USA San Diego, United States

    Hiring Senior Site Reliability Engineer; primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabili ...


  • Theery San Diego, United States

    Job Description: · Perform reliability evaluation of IC products, packages, and process technology with focus on suitability to end applications and conformance to industry standards. Perform device level failure analysis for an in-depth understanding of IC device failures. Ana ...


  • Talent Software Services San Diego, United States

    Site Reliability Engineer - Senior (NE) · Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. · Position Summary: ...


  • Peak Demand Inc San Diego, United States

    We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technology that is being deployed with custom ...


  • BD (Becton, Dickinson and Company) San Diego, United States

    **Job Description Summary** · **Job Description** · **We are the makers of possible.** · BD is one of the largest global medical technology companies in the world. Advancing the world of health is our Purpose, and its no small feat. It takes the imagination and passion of all ...


  • ServiceNow San Diego, United States

    Job Description · Take the next step in your career now, scroll down to read the full role description and make your application. · Please Note: · "This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards ...


  • ATR International San Diego, United States OTHER

    Job Description: · We are seeking a Reliability Development Engineer for a very important client · Job Overview - Principal Duties and Responsibilities · Successful candidate will be tasked for Product, Package reliability test tracking; reliability database, data analysis and s ...


  • ObjectWin Technology San Diego, United States

    Job Title: · Site Reliability Engineer · Location: · San Diego, CA or Remote in CA · Duration: · 6 Months · Description: · It is an exciting time to be part of SIEs CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software E ...


  • ContractTECH Inc San Diego, United States

    HiringSenior Site Reliability Engineer; primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabilit ...


  • Theery San Diego, United States

    Job Description:Perform reliability evaluation of IC products, packages, and process technology with focus on suitability to end applications and conformance to industry standards. Perform device level failure analysis for an in-depth understanding of IC device failures. Analyze ...


  • Yoh San Diego, United States

    Job description · Senior Site Reliability Engineer is needed for a contract opportunity for our Entertainment client located in San Mateo, CA. · Job Details · Duration: 6 months (Possibility to extend) · Location: San Diego, CA · Hourly Pay rate: $85-$94/hourly · Responsibilitie ...


  • ACL Digital San Diego, United States

    Job Title: Reliability Development Engineer · Location: San Diego, CA (Onsite) · Duration: Contract Project · Maximise your chances of a successful application to this job by ensuring your CV and skills are a good match. · Job Description: · Job Overview - Principal Duties a ...