- Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience
- Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
- Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams
- Manage Infrastructure through automation (Infrastructure as Code)
- Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments
- Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
- Improve the codebase by resolving logic issues, deprecating unused code, etc.
- Implement monitoring, logging, alerting and SLO Reporting
- Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives.
- Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence.
- 8+ years as a software engineer, shipping production code.
- 5+ years of experience as a Site Reliability Engineer.
- Experience with service-oriented architectures and microservices at scale
- Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.)
- Strong proficiency in SQL scripting
- Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others
- Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
- Proficiency in Git or other VCS
- Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.)
- Excellent debugging and troubleshooting skills
- Strong technical competency, with a data-driven analytical approach towards solving complex challenges
- Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive
- Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible
- Competitive salary & equity compensation for full-time roles
- Unlimited PTO, company holidays, and quarterly mental health days
- Comprehensive health benefits including medical, dental & vision, and parental leave
- Employee Stock Purchase Program (ESPP)
- Employee discounts on hims & hers & Apostrophe online products
- 401k benefits with employer matching contribution
- Offsite team retreats
-
Reliability Engineer
Found in: Lensa US 4 C2 - 4 days ago
OpenAI San Francisco, United StatesJoin the engineering teams that bring OpenAI's ideas safely to the world · The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits ...
-
Plant Reliability Engineer
Found in: Lensa US 4 C2 - 1 day ago
Corps Partners San Francisco, CA, United StatesThis opportunity is with a medium sized specialty chemical manufacturer located outside of San Francisco. The plant is PSM regulated, DCS controlled, with a very high standard of safety and overall housekeeping. Millions have been invested in the plant and more upgrades are plann ...
-
Plant Reliability Engineer
Found in: Lensa US 4 C2 - 3 days ago
Lincoln Search Consultants Inc San Francisco, United StatesPlant Reliability Engineer · Location: San Francisco, CA area · Description: · This opportunity is with a medium sized specialty chemical manufacturer located outside of San Francisco. The plant is PSM regulated, DCS controlled, with a very high standard of safety and overall h ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 7 hours ago
DAOmatch San Francisco, United StatesAptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way.Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ser ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 4 days ago
Replit San Francisco, United States[Full Time] Site Reliability Engineer at Replit (United States) | BEAMSTART Jobs · Site Reliability Engineer · Replit United States · Date Posted · 23 Feb, 2023 · Work Location · San Francisco, United States · Salary Offered · $70000 — $175000 yearly · Job Type · Full Time · Ex ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 1 day ago
DAOmatch San Francisco, United StatesAptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way.Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ser ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 22 hours ago
Federal Reserve Bank of New York San Francisco, CA, United StatesSr Site Reliability Engineer page is loaded Sr Site Reliability Engineer · Apply locations San Francisco, CA time type Full time posted on Posted 3 Days Ago job requisition id R Company · Federal Reserve Bank of San FranciscoWhen you join the SF Fed, you join a team of people w ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 4 days ago
Swish Analytics San Francisco, United States Full timeSwish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking f ...
-
Site Reliability Engineer
Found in: ZipRecruiter Test10P US C2 - 5 hours ago
Telestream, LLC San Francisco, United StatesJob Description · Job DescriptionAbout Us: · Welcome to the forefront of innovation at Telestream, an industry leading digital video delivery company. We are a dynamic and forward-thinking organization committed to leveraging cutting-edge cloud technologies to drive our success. ...
-
Plant Reliability Engineer
Found in: Jooble US O C2 - 22 hours ago
Johnson ReSource Group San Francisco, CA, United StatesThis opportunity is with a medium sized specialty chemical manufacturer located outside of San Francisco. The plant is PSM regulated, DCS controlled, with a very high standard of safety and overall housekeeping. Millions have been invested in the plant and more upgrades are plann ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 1 day ago
Technology Search Group, Inc. West Valley City, United StatesAbout the job Site Reliability Engineer (SRE) · Responsibilities · Responsible for collaborating with businesspeople to have a real time understanding of business problems and expected to focus on agile methodology of development. Deliver high quality change within the deadline ...
-
Site Reliability Engineer
Found in: Lensa US 4 C2 - 2 days ago
DoraHacks San Francisco, United StatesAbout DoraHacks · DoraHacks is a global hackathon organizer and one of the world's most active developer incentive platforms. It creates a global hacker movement in blockchain/Web3, quantum computing, space tech, and other frontier technology. DoraHacks provides a wide range of t ...
-
Site Reliability Engineer
Found in: beBee S2 US - 2 weeks ago
teamworkonline San Francisco, United StatesSwish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking f ...
-
Site Reliability Engineer
Found in: beBee S2 US - 1 week ago
Swish Analytics San Francisco, United States Full timeSwish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking f ...
-
Senior Systems Engineer and Reliability
Found in: Lensa US 4 C2 - 1 day ago
GE HealthCare Technologies, Inc. West Valley City, United StatesIdentify process and infrastructure gaps to increase operational reliability. Responds to production Install Base incidents and determine how we can prevent them in the future. Partner with other Engineering team leads to bring best practices and enf Systems Engineer, Liability, ...
-
Product Testing and Reliability Engineer
Found in: Lensa US 4 C2 - 1 day ago
Gridware San Francisco, United StatesWere scaling our fleet of infrastructure monitoring devices to detect the real-world fault events that are increasingly causing wildfires and blackouts as the climate changes. Weve already built and deployed thousands of our first generation products, and demand from customers is ...
-
Product Testing and Reliability Engineer
Found in: Lensa US 4 C2 - 1 day ago
Gridware San Francisco, United StatesWe're scaling our fleet of infrastructure monitoring devices to detect the real-world fault events that are increasingly causing wildfires and blackouts as the climate changes. We've already built and deployed thousands of our first generation products, and demand from customers ...
-
Site Reliability Engineer Manager
Found in: ZipRecruiter Test10P US C2 - 5 hours ago
Illuminate Literacy San Francisco, United StatesJob Description · Job DescriptionAs the Site Reliability Engineer at Illuminate Literacy, you will serve a critical role in our mission to eradicate illiteracy. You will lead and oversee our production environment's reliability, security, and quality assurance. This role involves ...
-
Senior Site Reliability Engineer
Found in: Lensa US 4 C2 - 3 days ago
Twitter San Francisco, United StatesCompany Description · Please note: This job posting is not for immediate hire but rather an opportunity to submit an application for future consideration. · Twitter promotes and protects the public conversation. Twitter is the town square of the internet. At Twitter, we work wit ...
-
Software Engineer, Site Reliability Engineer
Found in: Lensa US 4 C2 - 17 hours ago
Conduit San Francisco, United StatesAt Conduit, we're building the infrastructure platform that will scale Ethereum. We believe that a lack of compute has been holding back crypto app developers, and that we can unlock the next wave of innovation in the crypto ecosystem by providing turn-key chain infrastructure fo ...
Sr. Site Reliability Engineer - San Francisco, CA, United States - hims & hers
Description
About the Role:
We are seeking a Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.
You Will:
You Have:
Our Benefits (there are more but here are some highlights):
#LI-Remote
#J-18808-Ljbffr