Jobs
>
Roseland

    Site Reliability Engineer - Roseland, United States - CoreWeave

    CoreWeave
    CoreWeave Roseland, United States

    3 weeks ago

    Default job background
    Description


    CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry's fastest and most flexible infrastructure.

    CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds.

    Learn more at


    About the role:
    The Cloud Operations Team is the heart of CoreWeave's operational practice. In this role, you'll help define and shape how Site Reliability Engineering (SRE) is implemented at CoreWeave.

    The Cloud Operations team defines and implements tooling and processes that enable operational best practices and continual improvement across all engineering teams.


    An 'SRE of SREs,' you'll define and implement system and workflow automation ensuring service owners can rapidly identify and mitigate availability and performance regressions.

    Collaborating across engineering, you support service owning SRE's with the 'picks and shovels' they need to excel at running their services.


    You will work with a team of 8-10 mixed-specialization engineers and have the opportunity to work on the full gamut of rewarding challenges that come with building the AI Cloud in a communicative, supportive, and high-performing environment.

    As a member of the Cloud Operations Team you have the opportunity to:

    With a customer first mindset, establish reliability and quality assessment patterns for all CoreWeave systems.
    Improve the performance, security, reliability, and scalability of internal and externally facing services.
    Develop dashboards, alerts, automated remediation, and insights into the customer experience using observability tools.
    Create and maintain Kubernetes operators, custom controllers, and other tools to intelligently scale our operational capability.
    Establish and integrate incident and change management tools and workflows.
    Act as Incident Commander for priority incidents and lead post mortems.
    Participate in on-call rotation as needed as we establish and operationalize this new team
    Enable and evangelize reliability engineering across CoreWeave's engineering teams.

    Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and, above all, be yourself.

    Wondering if you're a good fit?


    We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.

    Here are some qualities we've found compatible with our team. If a portion of this resonates with you, we'd love to talk.


    You have experience operating services in production and are interested in driving engineering practices such as: reliability at scale, testing (load, recovery, system etc.), progressive deployments, error budgets, observability, and fault-tolerant design.

    You have experience automating manual processes and integrating various operations and productivity tools.

    You've done some Linux shell scripting and/or can navigate a *nix-based operating system (with the right cheat sheet, if required).

    You are familiar with debugging and administration of linux and Kubernetes environments.

    You're comfortable with the idea of codifying practices into Kubernetes controllers, operators, and other applications using a modern programming language.

    You have experience with incident management for your team or an organization.
    You're comfortable in open source environments.

    You're excited to join a team with diverse perspectives and backgrounds that believe in tackling challenges, growing hand in hand, and winning together.

    Why CoreWeave?


    At CoreWeave, we work hard, have fun, and move fast We're in an exciting stage of hyper-growth that you will not want to miss out on.

    We're not afraid of a little chaos, and we're constantly learning.

    Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

    Be Curious at your Core
    Act like an Owner
    Empower Employees
    Deliver Best In-Class Client Experience
    Achieve More Together

    We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding.

    You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.

    Come join us

    Benefits

    We offer a competitive salary and benefits, including:

    Medical, dental and vision insurance - 100% paid for the employee
    Life Insurance
    Short and long-term disability insurance
    Flexible Spending Account
    Flexible, full-service childcare support with Kinside
    401(k) with a generous employer match
    Flexible PTO
    Catered lunch each day in our offices
    Weekly massages in NJ office
    A casual work environment
    Work culture focused on innovative disruption

    California Consumer Privacy Act - California applicants only

    CoreWeave is an equal opportunity employer, committed to our diversity and inclusiveness.

    We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

    #J-18808-Ljbffr


  • Carreiras SoftExpert New York, United States

    Somos referência no mercado como uma empresa líder em soluções de software para a gestão integrada , contando com mais de 40 componentes dedicados à administração. Nosso objetivo é proporcionar às empresas a conformidade, inovação e transformação digital de processos, garantindo ...


  • Salling Group A/S New York, United States

    Jesteśmy oddziałem największej duńskiej firmy z branży handlu detalicznego. Działamy jako centrum usług biznesowych, w którym obsługujemy procesy zachodzące w naszych europejskich spółkach, takich jak: Bilka, Fotex, Salling czy dobrze znanej w Polsce — sieci dyskontowej Netto. · ...

  • Executive Alliance

    Reliability Engineer

    2 weeks ago


    Executive Alliance Pine Brook, United States

    Salary : $ $130000 · Essential Functions · Work with IPT to identify reliability critical items and any non-conformances and offer design alternatives. · Generate reliability, maintainability and system safety reports for IPT and customer review. · Perform system safety analy ...


  • Hepco Livingston, United States

    Overview: · Our client is seeking to hire a Senior Reliability and Safety Engineer with experience in the aerospace industry to join their team. The successful candidate will be responsible for safety assessment, reliability predictions, and failure mode analysis required for new ...


  • Hepco Livingston, United States

    Overview: · Our client is seeking to hire a Senior Reliability and Safety Engineer with experience in the aerospace industry to join their team. The successful candidate will be responsible for safety assessment, reliability predictions, and failure mode analysis required for ne ...

  • Enser

    Reliability Engineer

    17 hours ago


    Enser Boonton, United States

    Job Description · Job Description · ** · Reliability Engineer** · In this position, you will work in a dynamic people-focused environment where you will work directly with experienced engineers and scientists to assist with reliability analysis. You will also be responsible fo ...

  • Enser Corporation

    Reliability Engineer

    3 weeks ago


    Enser Corporation Boonton, United States

    Job Description · Job Description · Reliability Engineer · In this position, you will work in a dynamic people-focused environment where you will work directly with experienced engineers and scientists to assist with reliability analysis. You will also be responsible for attendin ...


  • Hepco East Hanover, United States

    Company: · HEPCO · Overview: · Design and develop products for aerospace and military weapon systems · Responsibilities include system safety assessment, reliability predictions and failure mode analysis for new development programs · Work with IPT to identify safety critical it ...


  • CoreWeave Roseland, United States

    CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry's fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases - VFX and rendering, machine learning and AI, batch p ...

  • Sodexo

    Reliability Engineer

    2 weeks ago


    Sodexo Rahway, United States Permanent

    Unit Description: Sodexo is currently seeking a Reliability Engineer to provide support for our Life Science Client located in Rahway, New Jersey. The Reliability Engineer will play a crucial role in managing and maintaining a substantial backlog of initiatives and strategic chan ...

  • JetBlue Airways

    Engineer Reliability

    3 weeks ago


    JetBlue Airways Queens, United States

    Position Summary · The Engineer Reliability reports to the Manager Reliability and is responsible for aircraft, power plant and component trend/performance analysis in support of the Reliability portion of the Continuing Analysis Surveillance System (CASS). · Essential Responsib ...


  • CoreWeave Roseland, United States

    Job Description · Job Description · CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry's fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rende ...


  • Marotta Controls Parsippany, United States

    Come grow with Marotta One of NJ's fastest growing technology companies, · named a New Jersey Top Workplace for 2022 & 2023, and a "Made in New Jersey" Manufacturer of the Year Award Winner. · You will have room to grow and be a part of an exciting team, all within a warm and w ...

  • MSD Malaysia

    Reliability Engineer

    3 weeks ago


    MSD Malaysia Rahway, United States

    locations · NLD - North Brabant - Oss (Vollenhovermeer) · time type · Full time · posted on · Posted 5 Days Ago · job requisition id · R293237 · Job Description · Welkom in ons team · Een competitief salaris. · Goed bonus Plan · Search Firm Representatives Please Read Careful ...


  • Jones Lange Lasalle, Inc. New York, United States

    JLL supports the Whole You, personally and professionally. Our people at JLL are shaping the future of real estate for a better world by combining world class services, advisory and technology to our clients. We are committed to hiring the best, most Reliability Engineer, Liabili ...


  • Dynamics ATS Montville, United States

    Sr. Engineer Reliability & Safety · Location: Parsippany, NJ · Salary : $100,000-$130,000 · Type: Direct · Enser is an Engineering Services Company that provides Staffing Support. This position is not internal to Enser. Please No Agencies. · Overview: · Marotta Controls ...

  • Mini-Circuits

    Reliability Engineer

    3 weeks ago


    Mini-Circuits Brooklyn, United States

    · Mini-Circuits designs, manufactures and distributes integrated circuits, modules, and sub-systems for high-performance radio frequency (RF) and microwave applications. With design, sales and manufacturing locations in over 30 countries, Mini-Circuits' products are used in a ra ...


  • Dynamics ATS Montville, United States

    Sr. Engineer Reliability & Safety · Location: · Parsippany, NJ · Salary · : $100,000-$130,000 · Type: · Direct · Enser is an Engineering Services Company that provides Staffing Support. This position is not internal to Enser. Please No Agencies. · Overview: · Marotta Con ...


  • Innova Solutions Summit, United States

    Innova Solutions is immediately hiring for a Facility & Reliability Engineer · Position type: Full-time Contract · Duration: 12 Months · Location: Summit, NJ Onsite) · As a Facility & Reliability Engineer, you will: · The purpose of the Facility and Reliability Engineer is ...

  • Madison Approach

    Reliability Engineer

    3 weeks ago


    Madison Approach Yonkers, United States

    Job Description · Job DescriptionOur client, a manufacturing company in Yonkers, NY, is seeking a Reliability Engineer for their Quality Assurance team. BS in Engineering and a minimum of 10 years' experience providing engineering support to manufacturing in the transportation in ...