Site Reliability Engineer, Cloud Operations - San Francisco, CA, United States - Philpar

    Default job background
    Description

    ShipHero:
    Senior Site Reliability Engineer

    We have built a software platform entrusted by hundreds of eCommerce companies, large and small to run their operations and we continue to grow.

    About US$5 billion of eCommerce orders are shipped a year via ShipHero. Our customers sell on Shopify, Amazon, Etsy, eBay, WooCommerce, BigCommerce, and many other platforms.

    We're driven to help our customers grow their businesses by providing a platform that solves complex problems, and is engineered to be reliable and fast.

    We are obsessed with building great technology, that is beautiful, easy to use, and is loved by our customers. Our team is fully remote, the company has always been remote.

    We communicate regularly using video chat and Slack and put a strong emphasis on asynchronous work so people have large chunks of uninterrupted time to focus and do deep work.

    We are looking for someone with a recent track record of building and maintaining complex infrastructure within AWS (Amazon Web Services).

    You would be a fundamental team member, focusing on building a solid foundation for the platform.

    We seek excited and driven people to continue growing with the experience of working with talented engineers and helping others improve.

    You understand modern web architectures and tiers.

    You have worked on medium and large projects that have gone to production and lived there for a while.

    Practical application with Infrastructure and Application Monitoring (We use Sentry, Honeycomb, and CloudWatch).
    Comfortable debugging running applications for memory leaks, CPU, and usage, especially under Apache, mod_wsgi, Nginx, and Gunicorn.
    Broad knowledge of AWS cloud security (AWS Inspector, Guard Duty. WAF & Security Hub), infrastructure-as-code.
    Python (preferably 3.Docker and building images including multi-stage with secrets
    CI/CD automation (we use GitHub Action, AWS CodeBuild, and CodePipeline)
    Provide hands-on configuration, setup & maintenance of our development, and production environments.
    Collaborate with other teams on monitoring & debugging solutions.
    Developing, automating, and operating our cloud infrastructure platform.
    Respond to incidents, ensuring the restoration of services when required.
    Ability to estimate effort and ship on an agreed schedule. Implement solutions that are pragmatic to get the platform built.
    Comment with your Google account if you'd like to be able to manage your comments in the future.