Senior Site Reliability Engineer - San Mateo, United States - Qualys

Qualys San Mateo, United States

4 weeks ago

Description

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world

Description

We are seeking a highly motivated and talented Site Reliability Engineer to work on Qualys' Cloud Platform & Middleware technologies.

Working with a team of engineers and architects, you will combine software development and systems engineering skills to build and run scalable, distributed and fault-tolerant systems.

The ideal candidate will write software to optimize day to day work through better automation, monitoring, alerting, testing and deployment.

Responsibilities

Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation andimprovement by applying scientific principles.

Increase the effectiveness, reliability and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.

Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.

Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.

Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.

Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies
Develop tools and automate the process for achieving large scaleprovisioning and deployment of cloud platform technologies
Participate in on-call rotation for cloud platform technologies.

At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.

Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis

Requirements

5 years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming language: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.

Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.

Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field

Bonus Points if you have:
Experience with managing large scale deployments of search engines like Elasticsearch
Experience with managing large scale deployments of message-oriented middleware such as Kafka
Experience with managing large scale deployments of RDBMS systems such as oracle
Experience with managing large scale deployments of NoSQL databases such as Cassandra
Experience with managing large scale deployments of In-memory caching using Redis, Memcached, etc.
Experience with container and orchestration technologies such as Docker, Kubernetes etc
Experience with monitoring tools such as Graphite, Grafana andPrometheus
Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
Experience with configuration management tools such as Chef, Puppet or Ansible
In-depth experience with continuous integration and continuous deployment pipelines
Exposure to Maven, Ant or Gradle for builds

Annual Salary Guidelines:
$115,000 - $135,000

Qualys is an Equal Opportunity Employer, please see our EEO policy.
#J-18808-Ljbffr

Site Reliability Engineer

4 days ago

Zoox Foster City, United States Full time

Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service fro ...
Camera Reliability Engineer

1 day ago

Skydio San Mateo, United States

Skydio is the leading US drone company and the world leader in autonomous flight, the key technology for the future of drones and aerial transportation. The Skydio team combines deep expertise in artificial intelligence, best-in-class hardware and software product development, an ...
Site Reliability Engineer

2 weeks ago

Arkose Labs San Mateo, United States

The mission of Arkose Labs is to create an online environment where all consumers are protected from online spam and abuse. Recognized by G2 as the 2023 Leader in Bot Detection and Mitigation, with the highest score in customer satisfaction and largest market presence four quarte ...
Site Reliability Engineer

1 week ago

eTek IT San Mateo, United States

Position : Site Reliability Engineer · Location : San Mateo, CA · Required Skills · • Must Haves: 3 to 5 years exp. Kubernetes, DataDog, cloud services, large scale systems, AWS&GCP, minor Azure · • GKE, home strung clusters on prem, and AKS (Very Small), EKS · • Consistent up ...
Site Reliability Engineer

2 weeks ago

Zoox San Mateo, United States

Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service fro ...
Site Reliability Engineering

2 weeks ago

eTek IT Services, Inc. San Mateo, United States

Job Description · Job DescriptionRequired Skills · • Must Haves: 3 to 5 years exp. Kubernetes, DataDog, cloud services, large scale systems, AWS GCP, minor Azure · • GKE, home strung clusters on prem, and AKS (Very Small), EKS · • Consistent upgrades across all the clusters and ...
Site Reliability Engineer

6 days ago

Verkada San Mateo, United States

Who We Are · Verkada is the largest cloud-based B2B physical security platform company in the world. Only Verkada offers six product lines - video security cameras, access control, environmental sensors, alarms, workplace and intercoms - integrated with a single cloud-based soft ...
Site Reliability Engineer

3 weeks ago

Bayone Foster City, United States

As a Site Reliability Engineer, you will: Keep a large production service up and running including: Host OS upgrades · Docker image upgrades · SSL certificate upgrades · Define and refine metrics to track service health and performance. · Automate software releases and service ...
Site Reliability Engineer

3 weeks ago

Zoox Foster City, United States

Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service fro ...
Site Reliability Engineer

1 week ago

Zoox Foster City, United States

Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service fro ...
Site Reliability Engineer

2 weeks ago

Verkada San Mateo, United States

Who We Are · Verkada is the largest cloud-based B2B physical security platform company in the world. Only Verkada offers six product lines — video security cameras, access control, environmental sensors, alarms, workplace and intercoms — integrated with a single cloud-based soft ...
Site Reliability Engineer, Senior Manager

6 days ago

Arkose Labs San Mateo, United States

The mission of Arkose Labs is to create an online environment where all consumers are protected from online spam and abuse. Recognized by G2 as the 2023 Leader in Bot Detection and Mitigation, with the highest score in customer satisfaction and largest market presence four quarte ...
Senior Manager, Site Reliability Engineering

6 days ago

Geico Insurance San Mateo, United States

Senior Manager, Site Reliability Engineering - Datacenter Hardware and IaaS · Position Summary · GEICO is seeking an experienced Senior Manager with a passion for building high performance, low-latency platforms, and applications. You will build and manage a team of engineers w ...
Site Reliability Engineer, Senior Manager

3 weeks ago

Arkose Labs San Mateo, United States

Job Description · Job DescriptionThe mission of Arkose Labs is to create an online environment where all consumers are protected from online spam and abuse. Recognized by G2 as the 2023 Leader in Bot Detection and Mitigation, with the highest score in customer satisfaction and la ...
Reliability Engineer

3 weeks ago

Comtech Telecom Santa Clara, United States Full time Regular

Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...
Reliability Engineer

3 weeks ago

Mainspring Energy, Inc. Menlo Park, United States

Job Description · Job DescriptionCompany Overview · Driven by our vision of the affordable, reliable, net-zero carbon grid of the future, Mainspring has developed a new category of power generation — the linear generator — that delivers local, scalable, and fuel-flexible power to ...
Director of Site Reliability Engineering

2 weeks ago

Arkose Labs San Mateo, United States

Job Summary: · We are looking for a Site Reliability Engineer Director to grow our team in California, US. This team is responsible for managing the availability and reliability of Arkose Labs production systems. · As an SRE Director you will be responsible for improving observ ...
Site Reliability Engineer

1 week ago

Box Redwood City, United States

WHAT IS BOX? · Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, coll ...
Site Reliability Engineer

2 weeks ago

C3 AI Redwood City, United States

, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI ...
Site Reliability Engineer

1 week ago

C3 AI Inc. Redwood City, United States

, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI ...

Senior Site Reliability Engineer - San Mateo, United States - Qualys

Description

Site Reliability Engineer

Camera Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineering

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer, Senior Manager

Senior Manager, Site Reliability Engineering

Site Reliability Engineer, Senior Manager

Reliability Engineer

Reliability Engineer

Director of Site Reliability Engineering

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Aristidis Dernelakis

Shih-Chieh Ting

for Recruiters

Information

Senior Site Reliability Engineer - San Mateo, United States - Qualys

Description

Senior Site Reliability Engineer professionals in San Mateo