Staff Engineer, Cloud Ops - Raleigh, United States - Druva

    Default job background
    Description


    Druva enables cyber, data and operational resilience for every organization with the Data Resiliency Cloud, the industry's first and only at scale SaaS solution.

    Customers can radically simplify data protection, streamline data governance, and gain data visibility and insights as they accelerate cloud adoption.

    Druva pioneered a SaaS-based approach to eliminate complex infrastructure and related management costs, and deliver data resilience via a single platform spanning multiple geographies and clouds.

    Druva is trusted by thousands of enterprises, including 60 of Fortune 500, to make data more resilient and accelerate their journey to the cloud.

    Visit and follow us on LinkedIn, Twitter and Facebook.

    As a Staff (Senior Staff depending on experience) Cloud Operations Engineer, you will play a crucial role in designing, building, and maintaining cloud infrastructure within a FedRAMP (Federal Risk and Authorization Management Program) compliant environment.

    Your focus will be on ensuring maximum uptime, performance, and security while leveraging cloud technologies. You'll collaborate with cross-functional teams to deliver robust solutions that meet the needs of our growing customer community.


    Responsibilities:

    Infrastructure Management:
    Participate in weekly Team meetings, working closely with devops, support, and quality assuranceMonitor and manage all cloud infrastructure, including Friday evening deployments of production updates and OS patchesSupport the various AWS services being utilized, aiming to become a lead expert in specific areasIncident Response and Troubleshooting:Prioritize and address tickets related to server, network, and storage issuesProvide escalated support for problems and incidentsEnsure incident resolution is within the defined SLA (Service Level Agreement)

    Automation and Monitoring:
    Develop and maintain automation scripts for testing and deployment.
    Configure and deploy supporting tools for monitoring and security (e.g, Grafana, Prometheus/Thanos, Tenable SC)

    Technical Skills:
    Proficiency in AWS services (or another Cloud Provider.
    Experience with containerized applications (ECS, Kubernetes, Docker)

    Experience with observability and monitoring (Grafana, OpenSearch)Experience with a SIEM (Security Incident and Event Management) applicationsFamiliarity with one programming language (Python, Go)Familiar with Infrastructure as Code (Terraform, CloudFormation)Knowledge of Linux operating systems deployments & patching (Ubuntu, RHEL)Communication and Collaboration:

    Effectively communicate project updates and tasks to the Operations team and stakeholders in other departmentsCollaborate with cross-functional teams to ensure seamless cloud operationsExpectations :Support production deployments scheduled on Fridays between 8pm - 2am EST on a rotating basisTake responsibility for identifying and resolving vulnerabilities in the FedRAMP environmentDevelop and monitor dashboards to track real-time metricsQualifications:Certifications (Bonus):AWS, Azure, RHEL, CCNAExperience:5+ years of experience in cloud operations or related fieldThe pay range for this position is expected to be between $119,000 and $166,333/year; however, base pay offered may vary depending on multiple individualized, non-discriminatory factors, including market location, job-related knowledge, skills, and experience.

    The total compensation package for this position may also include other incentive compensation opportunities in the form of discretionary annual bonus or commissions, and equity.

    Additionally, full-time employees are eligible to participate in our comprehensive benefits program, including health and wellness benefits, 401(k) retirement plan, life and disability insurance coverages, and other benefits the Company may offer from time to time.

    #J-18808-Ljbffr