Senior Research Computing Cloud SRE - New York, United States - PDT Partners

PDT Partners New York, United States

1 week ago

Description

The Research Computing HPC team is a group of experts solving computing problems in the critical path of Research. We work directly with Research and Model Implementation teams and provide them with tools and computing resources to take their ideas from inception to real tradable products. We are looking for an ambitious and operationally minded software engineer to join our team as we mature and scale our cloud HPC platform to the next iteration of our firm-wide Research platform.

Why join us?
PDT Partners has a stellar 30+ year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world-measured by the quality of our products, not their size. PDT's very high employee-retention and mobility speaks for itself. Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse. Our engineers love to work on challenging and complicated problems, and in return, they have a chance to make a direct impact on our bottom line, without the attitude and bureaucracy of a typical Wall Street firm.

Responsibilities:

We are a small flat team sitting at the cross-section of research, implementation, and platform infrastructure. Our team responsibilities span many areas. Below find a sampling of the types of work you will be expected to work on:

Design and implementation of cloud-based HPC systems:
- Our projects involve equal parts engineering and operations for success in our fast-moving environment. You will be expected to conceive and implement projects small and large.
Running our HPC plant day-to-day:
- Our research environment is up 24/7, and we want to keep it that way. Everybody on the team contributes to the support of our platform, which thankfully is light because of our automation and quality work.
Implementing automation:
- We will always choose to work smart over working hard. You will be responsible for conception and implementation of automation from CI/CD pipelines to production metrics and monitoring of our cloud HPC platform.
Capacity management and benchmark optimization:
- Our demand for compute is constant and involves challenging problems focused on scaling our compute, optimizing workloads, and choosing the right type of accelerators to target.
Obsessive User Focus:
- All members of the team are expected to partner with researchers and engineers to deliver high-quality cloud HPC systems that are efficient and reliable. This includes leading projects to evolve it as our needs change.
Design, implement, and deliver scalable and performant systems:
- Projects typically involve equal parts engineering and operations, for success in our fast-moving environment. You will be expected to do both for projects small and large, working with a mix of open-source and proprietary tools.
Implementing automation:
- We will always choose to work smart over working hard. You will be responsible for conception and implementation of new automation from CI/CD pipelines to production metrics to other automation for the platform infrastructure that your team owns.
Obsessive User Focus:
- All members of platform teams collaborate closely with peer engineers and/or researchers to build high-quality, efficient, and reliable systems. This includes adapting to change, and at times diving into new domains to deeply understand stakeholder needs.
Capacity management and benchmark optimization:
- Our demand for scale and performance is constant and involves challenging optimization problems for workloads critical to research and trading
Running our platform systems day-to-day:
- Our platforms are mission critical for the firm's success and are very stable, and we want to keep it that way. Everybody on the team contributes to the support of our platforms, which we strive to make light through automation and quality work.

Below is a list of skills and experiences we think are relevant. Even if you don't think you're a perfect match, we still encourage you to apply because we are committed to developing our people.

Experience with systems programming and/or software engineering
Practical experience supporting, debugging, and improving production systems and services
Experience using Linux and other Open Source Software
Experience with configuration management and infrastructure-as-code frameworks
Production experience working with a public cloud, AWS preferred
Qualified candidates will have at least one area of specialty platform knowledge: HPC, Trading, CI/CD, Kubernetes, Linux, Cloud Infrastructure, or Networking

Education:
Bachelors or Masters degree in an Engineering or Applied Sciences field from a rigorous academic program or equivalent professional experience.

The salary range for this role is between $195,000 and $225,000. This range is not inclusive of any potential bonus amounts. Factors that may impact the agreed upon salary within the range for a particular candidate include years of experience, level of education obtained, skill set, and other external factors.

PRIVACY STATEMENT: For information on ways PDT may collect, use, and process your personal information, please see PDT's privacy notices.

SRE for Cloud

1 week ago

Vodastra New York, United States

Job Description · Job DescriptionRole: · Manage cloud infrastructure, provide resource allocation, system upgrades, user access · control etc. · • Perform deep dives on complex system issues ranging from software bugs, hardware · failures to network issues. · • Build tools and au ...
Cloud SRE

2 weeks ago

Diverse Lynx New York, United States

Key Skills: Go lang Developer, Azure Infra, Kubernetes, Terraform · Strong knowledge of Go programming language, paradigms, constructs, and idioms. Knowledge of common Go routine and channel patterns, Go frameworks and tools · Good understanding in four or more Azure services su ...
SRE Cloud Engineer

3 weeks ago

S&P Global New York, United States

About the Role: · Grade Level (for internal use): · 10 The Team: The Ratings Site Reliability Engineering team (SRE) is made up of highly talented engineers and operations personnel. The current team is composed of highly skilled engineers with solid development background wh ...
SRE Cloud Engineer

2 weeks ago

S&P Global New York, United States

About the Role: · Grade Level (for internal use): · 10 · The Team: The Ratings Site Reliability Engineering team (SRE) is made up of highly talented engineers and operations personnel. The current team is composed of highly skilled engineers with solid development background w ...
Platform Infrastructure Cloud, SRE Engineer

12 hours ago

iCapital New York, United States

Job Description · Job DescriptioniCapital is powering the world's alternative investment marketplace. Our financial technology platform has transformed how advisors, wealth management firms, asset managers, and banks evaluate and recommend bespoke public and private market strate ...
Platform Infrastructure Cloud, SRE Engineer

19 hours ago

iCapital New York, United States

iCapital is powering the world's alternative investment marketplace. Our financial technology platform has transformed how advisors, wealth management firms, asset managers, and banks evaluate and recommend bespoke public and private market strategies for their high-net-worth cli ...
Senior Research Computing Cloud SRE

1 week ago

PDT Partners New York, United States

The Research Computing HPC team is a group of experts solving computing problems in the critical path of Research. We work directly with Research and Model Implementation teams and provide them with tools and computing resources to take their ideas from inception to real tradable ...
SRE Cloud Engineer

2 weeks ago

Trinity IT Services Berkeley Heights, United States

In this role, you will help build the technology responsible for our core services with a focus in the AWS (Amazon Web Services) Cloud eco-system. Your work will influence the success of companies across the world. Members of our Technology team are experts in the field, working ...
SRE Cloud Engineer

2 weeks ago

Trinity IT Services Berkeley Heights, United States

In this role, you will help build the technology responsible for our core services with a focus in the AWS (Amazon Web Services) Cloud eco-system. Your work will influence the success of companies across the world. Members of our Technology team are experts in the field, working ...
Director Sre

1 week ago

Royal Bank of Canada Jersey City, United States

**Job Summary** · **What is the opportunity?** · **What will you do?** · - Set vision for SRE product-base (i.e., chaos engineering, anomaly detection, self-healing resiliency, etc.) · - Partner closely with development teams from early in the development lifecycle to ensure conn ...
Site Reliability Engineering Director

1 week ago

Bright Horizons New York, United States

The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE prof ...
Global Infrastructure SRE Lead

1 week ago

Source Technology New York, United States

Global Infrastructure SRE Lead · Bay Area, Fulltime Onsite · Job Description · The Global Infrastructure SRE Lead will spearhead the management of everything related to our Site Reliability Engineering (SRE) domain. This includes the design, development, and upkeep of our infra ...
Principal SRE

1 week ago

Borneo New York, United States

Overview: · Borneo is seeking a skilled, experienced, and hands-on Principal Engineer to drive innovation and contribute to our mission of transforming data security and privacy. As the Principal Engineer, you will be a driving force in shaping the technical strategy and architec ...
Site Reliability Engineering Manager

1 week ago

developrec New York, United States

SRE Lead/Manager | San Diego, CA | Full-time · Role Overview: · As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently st ...
Senior Site Reliability Engineer

1 week ago

Apex Systems New York, United States

**WE CANNOT WORK CORP TO CORP (C2C/C2H) - ALL APPLICANTS MUST BE ABLE TO WORK ON APEX'S W2 WITHOUT SPONSORSHIP** · Apex Systems is looking to hire a · Senior Site Reliability Engineer · for one of their reputable financial clients they support in multiple core locations across ...
DevSecOps Engineer

2 days ago

Deutsche Telekom AG New York, United States

Als DevOps Engineer (m/w/d) im Automation Engineering nehmen Sie folgende Aufgaben wahr: · System- und Applikationsbetrieb der Cloud Platformen · Spezifikation von Automaten für die Cloud Plattformen · Betreiben von Connected Car Plattformen unter Verwendung unterschiedlicher Tec ...
Senior Site Reliability Engineer

1 week ago

Mondrian Alpha New York, United States

A leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve syste ...
Site Reliability Engineer

1 week ago

Lawrence Harvey New York, United States

Lawrence Harvey is partnered with a specialty financial institution that plays a critical role in the foreign exchange market. Their global settlement infrastructure reduces systemic risk at large and is a trusted party at the center of the global ecosystem. · They're in the pro ...
SRE/DevOps Engineer

2 weeks ago

Open Systems Technologies New York, United States

A financial firm is looking for an SRE/DevOps Engineer to join their team in New York, NY.Compensation: $150-200kResponsibilitiesDesign, implement, and manage AWS cloud infrastructure using Terraform and CloudFormationDevelop and maintain CI/CD pipelines using GitLab for seamless ...
DevOps Engineer

2 weeks ago

Motion Recruitment New York, United States

As a DevOps engineer, you will be supporting production and development environments, from creating new and improving existing tools and processes to automating deployment and monitoring procedures, leading continuous integration effort, administering source control systems, depl ...

Senior Research Computing Cloud SRE - New York, United States - PDT Partners

Description

SRE for Cloud

Cloud SRE

SRE Cloud Engineer

SRE Cloud Engineer

Platform Infrastructure Cloud, SRE Engineer

Platform Infrastructure Cloud, SRE Engineer