Jobs
>
Santa Clara

    Software Development Snr Director - Santa Clara, United States - Oracle

    Default job background
    Description


    At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors.

    Oracle's Cloud Infrastructure team is building Infrastructure-as-a-Service technologies that operate at a high scale in a broadly distributed multi-tenant cloud environment.

    We act with the speed and attitude of a start-up, with the scale and customer focus of the leading enterprise software company in the world.

    You are the builder here.

    You will be part of a team of really smart, motivated, and diverse people and given the autonomy and support to do your best work.

    It is a dynamic and flexible workplace where you'll belong and be encouraged.


    Are you interested in building large-scale distributed infrastructure for the cloud? Oracle's Cloud Infrastructure team is building new Infrastructure-as-a-Service technologies that operate at high scale in a broadly distributed multi-tenant cloud environment.

    Our customers run their businesses on our cloud, and our mission is to provide them with best-in-class compute, storage, networking, database, security, and an ever-expanding set of foundational cloud-based services.


    We're looking for a Senior Director of Site Reliability Engineering with expertise and passion in solving difficult problems in distributed systems, storage systems, virtualized infrastructure, and highly available services.

    If this is you, at Oracle you apply engineering leadership and knowledge of infrastructure and software development to drive scalable and reliable software systems for Oracle's storage product family.

    You will ensure the highest standards for resilience and safety.


    In this role, you will lead the development and execution of the Reliability strategy for products (i.e., Block Storage, Object Storage, File Storage, etc.).

    The systems which support these products are crucial to Oracle Cloud Infrastructure's global business. As such, it is critical that they operate reliably 24x7, and have sufficient capacity and operate efficiently.

    Challenging technical issues that impact detection, remediation, and prevention of disruptions appear and must be quickly addressed without unduly affecting both internal and external customers.


    As the Senior Director of Site Reliability Engineering (SRE) for Storage, you will think about how to solve common problems to enable consistency in the experience of our users and enable further growth of the various OCI businesses.


    Manage, innovate, and create programs, new software, analytics that drive improvements to the availability, scalability, latency, and efficiency of OCI Storage team's products and services.

    Work cross-functionally in partnership with service leads to guide service engineering to build reliable and durable production systems.

    Develop strategic directions, workforce plans, and organizational structure for the reliability teams within each service team.

    Optimize velocity and efficiency of delivery and drive continuous improvement. Keep a high operational bar.

    The successful candidate will be highly self-motivated with a passion for excellence, quality, and detail.

    As a leader, they are responsible for coaching and mentoring their team members, helping them achieve service goals, and build career paths in alignment.

    It's imperative for the leader to empower their team by providing appropriate context and timely feedback.

    Qualifications

    10+ years in critical, large scale distributed systems experience, combining Hardware, Operating Systems and Software. Proficiency in large-scale system design, algorithms, data structures, complexity analysis and software design.

    10+ years of experience building and leading engineering teams; ideally SRE or Production Engineering

    Strong emphasis on SRE as an engineering subject area.

    Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.


    Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making.


    Ability to do technical analyses of code, networking, operating systems, and storage and actively participate in strategic discussions with senior leadership team.



  • Mirantis Campbell, United States

    Company DescriptionMirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. The company uses a unique as-a-service model empowering developers to build, share and run their applications anywhere ...


  • Mirantis Campbell, United States

    Company Description · Mirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. The company uses a unique as-a-service model empowering developers to build, share and run their applications anywhe ...


  • Mirantis Campbell, CA, United States

    Company DescriptionMirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. The company uses a unique as-a-service model empowering developers to build, share and run their applications anywhere ...


  • Onto Milpitas, United States

    Onto Innovation is a leader in process control, combining global scale with an expanded portfolio of leading-edge technologies that include: 3D metrology spanning the chip from nanometer-scale transistors to micron-level die-interconnects; macro defect inspection of wafers and pa ...


  • CooperCompanies San Ramon, United States Full time

    Job Summary · Reporting to the Director of Commercial Data, we are seeking a Snr Manager to be responsible for governing our data implementing standards in new projects and migrations. This role will implement procedures and policies that help manage data as corporate asset, and ...