Senior SRE Engineer - Belmont, United States - RingCentral
Description
RingCentral's cloud-based communications platform connects more than 2 million users worldwide.
Are you looking for an opportunity where your skills and passion make a difference and where your voice will be heard?
We're the #1 global cloud-based, communications provider, growing at more than 30% annually, and we're looking for team members with an entrepreneurial spark We build a high-available cloud-based contact center that combines all kinds of telephony features and many digital communication channels with customers (WhatsApp, Facebook, Twitter, email, SMS, etc.) into one service. We develop a modern and reliable product that helps companies be closer to their customers and respond to their requests as quickly and efficiently as possible.
You will be a part of the team responsible for running our product and its cloud infrastructure. You will contribute to the product and infrastructure focusing on availability, maintainability, and scalability. You will apply the best practices of site reliability engineering, operational discipline, and automation.
You should be motivated, organized, excited about technology and SaaS products, a thorough critical thinker, and relentless in code quality, scalability, latency, and platform stability. Our culture is motivational, constructive, and positive. We value teamwork, camaraderie, and collaboration. If you're up for a fun challenge, we want to hear from you.
Technology Stack: AWS, Kubernetes (EKS), Aurora RDS (PostgreSQL/MySQL), Kafka, Argo CD, Prometheus, Jenkins, GitLab CI, Terraform, Ansible, Python, Java, Ruby.
Responsibilities:
Design, plan and implement a HA and cost-effective cloud infrastructure with an IaC approach
Develop, scale, and maintain automated CI/CD process using the GitOps
Increase service automation to improve maintainability, scalability, and engineering productivity
Plan system capacity and develop tooling for product on-demand scaling
Troubleshoot and resolve software and technical issues, participate in incidents resolution, and perform root cause analysis
Participate in an on-call process
Plan disaster recovery procedures and develop automation for fast and reliable service restoration
Implement security & compliance requirements
Interact with development and architecture teams to improve service observability and performance, eliminate logging and monitoring white spots, suggest architectural and process improvements
Evaluate and adopt new cloud-native technologies
Qualifications:
5+ years of technical experience in the same or similar role supporting large-scale and high-load cloud-based production systems
Experience in the development and support of public cloud infrastructure
Hands-on experience in running HA applications and development of the CI/CD process in Kubernetes
Proven programming skills in Python, Go or similar
Good knowledge of Linux environment, TCP/IP, network routing, DNS
Familiar with SRE principles, DevOps practices, and modern cloud-native landscape
Accuracy, attention to details, ability to follow processes
Good communication skills
Experience with Contact Center, VoIP solutions is a HUGE plus
Ability to read and troubleshoot Java code if needed is a plus
Experience in SQL/NoSQL DB's or attitude to develop skills in this field is a plus