- Creating ecstatic customers
- Ensuring frictionless deployments
- Escalation management
- On-call duties
- Radiate energy and enthusiasm
- Be a (technical) leader to the team
- MS Computer Science, Information Technology or similar experience
- 10+ years experience in SRE, with 5+ years experience in a SRE Operating Lead role
- Leadership qualities and aspirations
- Project and escalation management skills
- Proven technical writing skills
- Excellent communication and written skills (English)
- An infrastructure with global presence in USA, EMEA, China and GovCloud
- A large, complex, infrastructure with 20+ SaaS instances, 500+ VMs, 100+ databases, 10+ logging services
- Meeting SLOs and creating robust and insightful metrics for large infrastructures and multiple SaaS instances
- Capacity planning of a complex solution with 50k+ connected devices
- Continuously driving cost down to maintain a competitive advantage
- Managing a successful 24x7x365 on-call team and being point of escalation Implementing a structured incident management approach from the start of incident, resolution to root cause analyses.
- Industry standards compliance, ISO-27001, SOC-2
- Strong leadership skills with ability to coach and hIre A-players, and foster a culture of continuous improvement and automation.
- Putting security at the center of everything you do.
- Hands-on knowledge of: AWS, Azure or GCP
- Terraform, Ansible
- Python, Shell script(managed) Kubernetes, ArgoCD
- GitOps, Jenkins, Github Actions
- Datadog, Grafana Stack and Open Telemetry
- PostgreSQL, Redis, Hashicorp Vault, InfluxDB and Open Search
- Lacework, Blameless, Vanta
-
Sr. Manager of SRE Operations
1 week ago
ZEDEDA San Jose, United StatesZEDEDA is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control and security for the distributed edge with the freedom of deploying and managing any app on any hardware at scale and connecting to any cloud or on-premises system. With ...
-
Technical Program Analyst
2 weeks ago
Artech LLC San Jose, United StatesDo something big and innovative Stretch your creative muscles and work on big issues. Since 1989, we have developed technology environments, applications, and tools by providing experienced teams to implement, enhance, and maintain our clients' essential systems and applications. ...
-
DevOps Engineer
2 weeks ago
Zoom San Jose, United States Full timeWhat you can expect · We are looking for a DevOps Engineer to join our Realtime DevOps Team. You'll be responsible for system performance, availability, and resiliency. You'll work on problems of diverse scope where analysis of data requires evaluation of identifiable factors. Y ...
-
Senior Director, Site Reliability Engineer
23 hours ago
Zscaler San Jose, United StatesAbout Zscaler · Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange is the company's cloud-native platform that protects thousands of customers from cyberattacks and data lo ...
-
SRE Leader, Global E-Commerce
1 week ago
Tik Tok San Jose, United StatesResponsibilities · TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. . · Why Join Us · ...
-
TikTok San Jose, United StatesResponsibilities · TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. · Why Join Us · Cre ...
-
Site Reliability Engineer
2 weeks ago
Myriad Consulting Inc San Jose, United StatesThis role also open for junior (3+ yoe) candidates, and SRE lead (7+ yoe). · Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you ll have the opportunity ...
-
Senior Site Reliability Engineer
2 weeks ago
Hireio, Inc. San Jose, United StatesJob Description · Job DescriptionAbout the company · It is the leading destination for short-form mobile video. It is the largest Unicorn startup. It's the leader in short-form video hosting service now. It surpassed 1.3 billion mobile downloads in United States and 2 billion wor ...
-
Principal Site Reliability Engineer
4 days ago
Palo Alto Networks Santa Clara, United StatesOur Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...
-
Senior Site Reliability Engineer
2 weeks ago
HireIO Inc San Jose, United StatesAbout the company · Please read the information in this job post thoroughly to understand exactly what is expected of potential candidates. · It is the leading destination for short-form mobile video. It is the largest Unicorn startup. It's the leader in short-form video hosting ...
-
Senior SRE/SDE
2 weeks ago
Selby Jennings San Jose, United StatesThe Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best enginee ...
-
Senior SRE/SDE
2 weeks ago
Selby Jennings San Jose, United StatesThe Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best enginee ...
-
Senior SRE/SDE
1 week ago
Selby Jennings San Jose, United StatesThe Company: Our client is one of the world's leading social media companies. This platform allows innovative avenues to express creativity, explore interests, and most importantly global connectivity. Having over a billion users, this company pursues the best of the best enginee ...
-
Site Reliability Engineer
23 hours ago
Zoom San Jose, United States** Sponsorship is not available for this position ** · What you can expect · As a senior level Product Resilience SRE, you will define, scope, plan, and schedule Disaster Recovery Testing at Zoom. You will document any gaps identified by our testing, and drive technical solutions ...
-
Sr Site Reliability Engineer
5 days ago
Palo Alto Networks Santa Clara, United StatesOur Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...
-
Principal Engineer
1 day ago
Zoom Video Communications San Jose, United StatesWhat you can expect · The Zoom DevOps organization is responsible for the end to end external user experience of Zoom. · About the Team · In this role, you will work as a Principal Engineer reporting to the Head of DevOps to drive long-term improvements to Zoom's production ...
-
Senior Site Reliability Engineer
2 weeks ago
Hireio, Inc. San Jose, United StatesJob Description · Job DescriptionJob Description · Position Description: · Location: Usa/Usa/California/Sf Bay Area, Seattle · Base Salary: 187K - 280K · Sponsor Visa? Yes · Language Requirements: English, Mandarin (Preferred) · Our Team: · Site Reliability Engineering(SRE) team ...
-
Amazon San Jose, United StatesSolutions Reliability Engineer, Selling Partner Services · Are you passionate about managing the development, testing and support processes in a rapid software development life cycle? Automating DevOPs & Testing Tasks? Solving operational and reliability problems? · Key job res ...
-
Sr. Staff Linux SRE
3 hours ago
Archer San Jose, United StatesArcher is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance the benefits of sustainable air mobility. We are designing, manufacturing, and operating an all-electric aircraft that can carry f ...
-
Associate Site Reliability Engineer
23 hours ago
Zscaler San Jose, United StatesAbout Zscaler · Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange is the company's cloud-native platform that protects thousands of customers from cyberattacks and data lo ...
Sr. Manager of SRE Operations - San Jose, United States - Zededa
Description
ZEDEDA is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control and security for the distributed edge with the freedom of deploying and managing any app on any hardware at scale and connecting to any cloud or on-premises system.
With ZEDEDA customers can seamlessly manage and deploy any compute node to instantly unlock the value of IoT data, make real-time decisions, maximize operational efficiency and drive new business outcomes.
We are looking for an experienced Senior Site Reliability Engineer (SRE) who is seeking new challenges and wants to make their mark by contributing to the design and upkeep of an exciting start-up.
Reporting to the VP of Engineering, the Sr.Manager of SRE Operations is responsible for ensuring the availability of our SaaS platform and exceeding the uptime and performance requirements of our Fortune 500 customers.
Together with the SRE Operations team you will implement processes and procedures that will ensure meeting the quality and predictability of disaster recovery, performance monitoring and alerting as well as reporting.
ZEDEDA is ISO27001 and SOC2 certified which means that incidents need to be handled according to those standards.Being the lead of the team you will play a key role in ensuring the team performs beyond expectations and assists in growing the team.
On-call responsibility is part of the role as well as implementing a strategy that supports 24 x 7 x 365 availability of the SRE Operations team, additionally you will be the initial escalation point for incidents and are responsible for ensuring they get resolved by including other teams if needed.
You will work with the SRE Technical Lead and team, as well as other groups in engineering to suggest and implement improvements for operating the platform.
Zededa's main compensation philosophy is to provide you with the opportunity to progress as you grow and develop with the company.
The base pay range, dependent on your skills, qualifications, experience and location for this role is between $175,000 and $200,000, and will also include commission, equity and benefits components to round out your total compensation.