Senior Site Reliability Engineer - Seattle, United States - Glocomms

    Default job background
    Description

    The e-commerce industry has seen tremendous growth in recent years and has become a hotly contested space amongst leading Internet companies, and its future growth cannot be underestimated. With millions of loyal users globally, this firm is an ideal platform to deliver a brand new and better e-commerce experience to our users. Their product engineering team is responsible for building an e-commerce ecosystem that is innovative, secure and intuitive for our users. They are looking for passionate and talented people to join us as we drive the future of e-commerce.

    While professional experience and qualifications are key for this role, make sure to check you have the preferable soft skills before applying if required.
    • This is an ON-SITE position and offers the opportunity to work within their San Jose, CA or Seattle, WA offices.
    • This firm offers potential relocation assistance and has the potential to sponsor VISA as well.

    Responsibilities

    • Be part of global SRE on-call rotation and be responsible for Tier-1 online incident response and DevOps support.
    • Be responsible for service levels of mission critical, revenue-generating E-commerce platform as well as all supporting infrastructure and services. This role will focus on service reliability, highly-scalable design, and release management in a cloud-native environment.
    • Define service level indicators and data-driven objectives, and develop devops / SRE standards, processes and methodologies, to uphold and improve uptime, latency, and system health of a core global e-commerce production platform.
    • Collaborate cross-team with engineering and product to ensure that key stability and maintainability requirements, such as capacity planning and launch reviews, are performed to enable transparent service delivery to customers.
    • Design strategies for risk detection and mitigation, disaster recovery & simulation, release management, cost optimization, engineering quality etc...
    • Automation geared towards infrastructure-as-code, scalability and service resiliency.
    • Implement best practices around incident management, post-mortems while being part of on-call rotations.

    Qualifications

    • Bachelors or higher degree in Computer Science, similar technical field of study, or equivalent practical experience.
    • 3+ years experience developing, provisioning or maintaining production-grade large scaled distributed systems.
    • High level of proficiency in Linux OS internals, networking, microservices, databases, caches, in cloud-native environments.
    • Demonstrable familiarity with programming or scripting languages (Go/Python/Bash/C++ etc).
    • Demonstrable experience in the development and implementation of devops and SRE methodologies.
    • Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
    • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.

    Please do not hesitate to apply