Jobs
>
New York City

    Senior Research Computing Cloud SRE - New York, United States - PDT Partners

    Default job background
    Description

    The Research Computing HPC team is a group of experts solving computing problems in the critical path of Research. We work directly with Research and Model Implementation teams and provide them with tools and computing resources to take their ideas from inception to real tradable products. We are looking for an ambitious and operationally minded software engineer to join our team as we mature and scale our cloud HPC platform to the next iteration of our firm-wide Research platform.

    Why join us?
    PDT Partners has a stellar 30+ year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world-measured by the quality of our products, not their size. PDT's very high employee-retention and mobility speaks for itself. Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse. Our engineers love to work on challenging and complicated problems, and in return, they have a chance to make a direct impact on our bottom line, without the attitude and bureaucracy of a typical Wall Street firm.

    Responsibilities:

    We are a small flat team sitting at the cross-section of research, implementation, and platform infrastructure. Our team responsibilities span many areas. Below find a sampling of the types of work you will be expected to work on:

    • Design and implementation of cloud-based HPC systems:
      • Our projects involve equal parts engineering and operations for success in our fast-moving environment. You will be expected to conceive and implement projects small and large.
    • Running our HPC plant day-to-day:
      • Our research environment is up 24/7, and we want to keep it that way. Everybody on the team contributes to the support of our platform, which thankfully is light because of our automation and quality work.
    • Implementing automation:
      • We will always choose to work smart over working hard. You will be responsible for conception and implementation of automation from CI/CD pipelines to production metrics and monitoring of our cloud HPC platform.
    • Capacity management and benchmark optimization:
      • Our demand for compute is constant and involves challenging problems focused on scaling our compute, optimizing workloads, and choosing the right type of accelerators to target.
    • Obsessive User Focus:
      • All members of the team are expected to partner with researchers and engineers to deliver high-quality cloud HPC systems that are efficient and reliable. This includes leading projects to evolve it as our needs change.
    • Design, implement, and deliver scalable and performant systems:
      • Projects typically involve equal parts engineering and operations, for success in our fast-moving environment. You will be expected to do both for projects small and large, working with a mix of open-source and proprietary tools.
    • Implementing automation:
      • We will always choose to work smart over working hard. You will be responsible for conception and implementation of new automation from CI/CD pipelines to production metrics to other automation for the platform infrastructure that your team owns.
    • Obsessive User Focus:
      • All members of platform teams collaborate closely with peer engineers and/or researchers to build high-quality, efficient, and reliable systems. This includes adapting to change, and at times diving into new domains to deeply understand stakeholder needs.
    • Capacity management and benchmark optimization:
      • Our demand for scale and performance is constant and involves challenging optimization problems for workloads critical to research and trading
    • Running our platform systems day-to-day:
      • Our platforms are mission critical for the firm's success and are very stable, and we want to keep it that way. Everybody on the team contributes to the support of our platforms, which we strive to make light through automation and quality work.
    Below is a list of skills and experiences we think are relevant. Even if you don't think you're a perfect match, we still encourage you to apply because we are committed to developing our people.
    • Experience with systems programming and/or software engineering
    • Practical experience supporting, debugging, and improving production systems and services
    • Experience using Linux and other Open Source Software
    • Experience with configuration management and infrastructure-as-code frameworks
    • Production experience working with a public cloud, AWS preferred
    • Qualified candidates will have at least one area of specialty platform knowledge: HPC, Trading, CI/CD, Kubernetes, Linux, Cloud Infrastructure, or Networking
    Education:
    Bachelors or Masters degree in an Engineering or Applied Sciences field from a rigorous academic program or equivalent professional experience.

    The salary range for this role is between $195,000 and $225,000. This range is not inclusive of any potential bonus amounts. Factors that may impact the agreed upon salary within the range for a particular candidate include years of experience, level of education obtained, skill set, and other external factors.

    PRIVACY STATEMENT: For information on ways PDT may collect, use, and process your personal information, please see PDT's privacy notices.

  • Vodastra

    SRE for Cloud

    1 week ago


    Vodastra New York, United States

    Job Description · Job DescriptionRole: · Manage cloud infrastructure, provide resource allocation, system upgrades, user access · control etc. · • Perform deep dives on complex system issues ranging from software bugs, hardware · failures to network issues. · • Build tools and au ...

  • Diverse Lynx

    Cloud SRE

    2 weeks ago


    Diverse Lynx New York, United States

    Key Skills: Go lang Developer, Azure Infra, Kubernetes, Terraform · Strong knowledge of Go programming language, paradigms, constructs, and idioms. Knowledge of common Go routine and channel patterns, Go frameworks and tools · Good understanding in four or more Azure services su ...

  • S&P Global

    SRE Cloud Engineer

    3 weeks ago


    S&P Global New York, United States

    About the Role: · Grade Level (for internal use): · 10 The Team: The Ratings Site Reliability Engineering team (SRE) is made up of highly talented engineers and operations personnel. The current team is composed of highly skilled engineers with solid development background wh ...

  • S&P Global

    SRE Cloud Engineer

    2 weeks ago


    S&P Global New York, United States

    About the Role: · Grade Level (for internal use): · 10 · The Team: The Ratings Site Reliability Engineering team (SRE) is made up of highly talented engineers and operations personnel. The current team is composed of highly skilled engineers with solid development background w ...


  • iCapital New York, United States

    Job Description · Job DescriptioniCapital is powering the world's alternative investment marketplace. Our financial technology platform has transformed how advisors, wealth management firms, asset managers, and banks evaluate and recommend bespoke public and private market strate ...


  • iCapital New York, United States

    iCapital is powering the world's alternative investment marketplace. Our financial technology platform has transformed how advisors, wealth management firms, asset managers, and banks evaluate and recommend bespoke public and private market strategies for their high-net-worth cli ...


  • PDT Partners New York, United States

    The Research Computing HPC team is a group of experts solving computing problems in the critical path of Research. We work directly with Research and Model Implementation teams and provide them with tools and computing resources to take their ideas from inception to real tradable ...

  • Trinity IT Services

    SRE Cloud Engineer

    2 weeks ago


    Trinity IT Services Berkeley Heights, United States

    In this role, you will help build the technology responsible for our core services with a focus in the AWS (Amazon Web Services) Cloud eco-system. Your work will influence the success of companies across the world. Members of our Technology team are experts in the field, working ...

  • Trinity IT Services

    SRE Cloud Engineer

    2 weeks ago


    Trinity IT Services Berkeley Heights, United States

    In this role, you will help build the technology responsible for our core services with a focus in the AWS (Amazon Web Services) Cloud eco-system. Your work will influence the success of companies across the world. Members of our Technology team are experts in the field, working ...

  • Royal Bank of Canada

    Director Sre

    1 week ago


    Royal Bank of Canada Jersey City, United States

    **Job Summary** · **What is the opportunity?** · **What will you do?** · - Set vision for SRE product-base (i.e., chaos engineering, anomaly detection, self-healing resiliency, etc.) · - Partner closely with development teams from early in the development lifecycle to ensure conn ...


  • Bright Horizons New York, United States

    The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...


  • Source Technology New York, United States

    Global Infrastructure SRE Lead · Bay Area, Fulltime Onsite · Job Description · The Global Infrastructure SRE Lead will spearhead the management of everything related to our Site Reliability Engineering (SRE) domain. This includes the design, development, and upkeep of our infra ...

  • Borneo

    Principal SRE

    1 week ago


    Borneo New York, United States

    Overview: · Borneo is seeking a skilled, experienced, and hands-on Principal Engineer to drive innovation and contribute to our mission of transforming data security and privacy. As the Principal Engineer, you will be a driving force in shaping the technical strategy and architec ...


  • developrec New York, United States

    SRE Lead/Manager | San Diego, CA | Full-time · Role Overview: · As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently st ...


  • Apex Systems New York, United States

    **WE CANNOT WORK CORP TO CORP (C2C/C2H) - ALL APPLICANTS MUST BE ABLE TO WORK ON APEX'S W2 WITHOUT SPONSORSHIP** · Apex Systems is looking to hire a · Senior Site Reliability Engineer · for one of their reputable financial clients they support in multiple core locations across ...

  • Deutsche Telekom AG

    DevSecOps Engineer

    2 days ago


    Deutsche Telekom AG New York, United States

    Als DevOps Engineer (m/w/d) im Automation Engineering nehmen Sie folgende Aufgaben wahr: · System- und Applikationsbetrieb der Cloud Platformen · Spezifikation von Automaten für die Cloud Plattformen · Betreiben von Connected Car Plattformen unter Verwendung unterschiedlicher Tec ...


  • Mondrian Alpha New York, United States

    A leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve syste ...


  • Lawrence Harvey New York, United States

    Lawrence Harvey is partnered with a specialty financial institution that plays a critical role in the foreign exchange market. Their global settlement infrastructure reduces systemic risk at large and is a trusted party at the center of the global ecosystem. · They're in the pro ...

  • Open Systems Technologies

    SRE/DevOps Engineer

    2 weeks ago


    Open Systems Technologies New York, United States

    A financial firm is looking for an SRE/DevOps Engineer to join their team in New York, NY.Compensation: $150-200kResponsibilitiesDesign, implement, and manage AWS cloud infrastructure using Terraform and CloudFormationDevelop and maintain CI/CD pipelines using GitLab for seamless ...

  • Motion Recruitment

    DevOps Engineer

    2 weeks ago


    Motion Recruitment New York, United States

    As a DevOps engineer, you will be supporting production and development environments, from creating new and improving existing tools and processes to automating deployment and monitoring procedures, leading continuous integration effort, administering source control systems, depl ...