Jobs
>
Remote

    Site Reliability Engineer - Remote, United States - Aurora Labs

    Aurora Labs
    Aurora Labs Remote, United States

    3 weeks ago

    Default job background
    Full time
    Description

    About Us

    Aurora Labs is the development company behind Aurora—the EVM blockchain that runs on the NEAR Protocol. We are also the developers of, and integration partner behind, Aurora Cloud—a suite of products that allow Web2 companies to capture the value of Web3.

    We invite you to be a part of our team of smart, professional, result-oriented and fun individuals. Join us to help ensure that our background processes run smoothly while we are striving to become the best in the industry.

    About the team

    Our infrastructure team is responsible for building and supporting critical systems required for running and accessing NEAR and Aurora networks. That includes everything on the path of RPC requests before they hit the blockchain and block production and event delivery once transactions are executed.

    Load balancing, caching, queueing, transaction simulation and block production is processed by the services written and maintained by the infrastructure team. These services operate at large scale and process terabytes of data. The platform is based on open-source software, such as Kubernetes, NATS, Jetstream, Blockscout, Grafana, Postgres and Near-core, alongside a few internally developed services.

    All internally developed services are written in Go and implement core pieces of functionality such as Mempool management, NEAR chunk distribution, transaction pre-processing and simulation.

    About the position

    This role is split between two responsibilities: site reliability (80%) and software engineering (20%).

    Reliability Engineering includes:

    - Ensuring high availability and failure tolerance of our infrastructure.

    - Automating configuration and maintenance of software components such as K8s, NATS, Influxdb, Postgres, Cloudflare using e.g. Ansible, Terraform, Helm and kubernetes operators.

    - Design and implementation of cloud-agnostic solutions without exclusively relying on specific cloud vendors.

    - Validator and RPC nodes management automation.

    - Optimizing the latency and throughput of the pub-sub infrastructure- Incident management, monitoring, distributed tracing and recovery automation.

    Software Engineering projects include:

    - Sidecars that implement infrastructure cloud-agnostic abstractions for developers.

    - CLI tools for pubsub and streaming infrastructure operations.

    - Time series processing engine for our transaction simulation engine.

    - Indexers and blockchain event aggregation pipelines for monitoring purposes

    About you

    You are a reliability engineer with experience of creating and maintaining backend systems. You are familiar with the entire Linux stack and can easily find a bottleneck in a distributed system. You have developed CLI tools and backend services before and are comfortable applying your software development skills to automate your daily operations or to create a microservice on the request path of the end users.

    Key Qualifications

    - Strong emphasis on SRE as an engineering subject area, with proficiency in Golang.

    - Successful track-record and proven experience as a backend internet services software developer.

    - Knowledge of SDLC, including continuous integration and testing methodologies.

    - Understanding of base internet infrastructure services including DNS, HTTP, server virtualization, server monitoring in critical, large scale distributed systems.

    - Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.

    - Excellent verbal and written communication skills in English.

    Desired skills

    - Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs.

    - Experience with streaming and pubsub systems such as NATS, Apache Kafka, Apache Pulsar.

    - Hardware bootstrap and associated security.

    - Structured or unstructured storage and caching.

    - Automating operations processes via services and tools.

    - Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others.

    - Cloud Services (AWS S3/EC2/CloudFront or equivalent).

    Join our dedicated team of blockchain industry professionals.

    Please apply today — we're standing by for your resume

    In applying at this job, I confirm and acknowledge that I read and understood the Privacy Notice published at



  • Coinbase Remote, United States

    We're a group of hard-working overachievers who are deeply focused on building the future of finance and Web3 for our users across the globe, whether they're trading, storing, staking or using crypto. Know those people who always lead the group project? We're a remote-first compa ...


  • Roadie Remote, United States Full time

    Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing ...


  • Podium Remote, United States Full time

    · At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, c ...


  • OPENLANE Remote, United States Full time

    Who We Are: · At OPENLANE we make wholesale easy so our customers can be more successful. · We're a technology company building the world's most advanced-and uncomplicated-digital marketplace for used vehicles. · We're a data company helping customers buy and sell smarter with cl ...


  • Fireblocks Remote, United States Full time

    · The world of digital assets is accelerating in speed, magnitude, and complexity, opening the door to new ways for leveraging the blockchain. Fireblocks' platform and network provide the simplest and most secure way for companies to work with digital assets and it trusted by so ...


  • Brooksource Remote, United States

    Contract to Hire * · *Remote (EST Time Zone)* · Our Fortune 15 health care client is seeking a Site Reliability Engineer (SRE) to assist them as they fully transition to the cloud. You will play a critical role in ensuring the reliability, scalability, and performance of their sy ...


  • SS&C Technologies Holdings Remote, United States Full time

    Job Description · Senior Site Reliability Engineer · Locations: Jacksonville, FL | Hybrid or Florida | Georgia | Texas | Remote · Get to Know the Team: · SS&C Advent Software is looking for a motivated and experienced Site Reliability Engineer to help with improving the architect ...


  • Laserfiche Remote, United States

    Job Description · Job DescriptionSite Reliability Engineers (SREs) at Laserfiche are responsible for keeping our Laserfiche Cloud systems online and performant for our customers. They react quickly to reported issues within the systems, promote and implement proactive monitoring ...


  • Sunrun Remote, United States Full time

    Everything we do at Sunrun is driven by a determination to transform the way we power our lives. We know that starts at the individual employee level. We strive to foster an environment you can thrive in through our commitment to diversity, inclusion and belonging. · Objective: · ...


  • Zocdoc Remote, United States Full time

    · Our Mission · Healthcare should work for patients, but it doesn't. In their time of need, they call down outdated insurance directories. Then wait on hold. Then wait weeks for the privilege of a visit. Then wait in a room solely designed for waiting. Then wait for a surprise b ...


  • Arcadia (DC) Remote, United States Full time

    Who We Are · Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean ener ...


  • Edge & Node Remote, United States Full time

    Edge & Node stands as the revolutionary vanguard of web3, a vision of a world powered by individual autonomy, shared self-sovereignty and limitless collaboration. Established by trailblazers behind The Graph, we're on a mission to make The Graph the internet's unbreakable foundat ...


  • Sojern Remote, United States Full time

    Position Summary: · Sojern is looking for a Senior Site Reliability Engineer in the US to collaborate with Software Engineering teams located primarily in the Pacific Time Zone. An ideal candidate would have extensive experience building cloud infrastructure on Google Cloud with ...


  • Lumin Digital Remote, United States Full time

    Our Site Reliability Engineers (SRE) are good developers with an operations mindset. They enjoy reducing or completely eliminating manual tasks, are excellent problem solvers, and know automation is the key to operating a large-scale system. · SREs make sure that our application ...


  • Modern Health Remote, United States Full time

    · Modern Health · Modern Health is a mental health benefits platform for employers. We are the first global mental health solution to offer employees access to one-on-one, group, and self-serve digital resources for their emotional, professional, social, financial, and physical ...


  • DFIN Remote, United States Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and d ...


  • Supabase Remote, United States Full time

    Supabase is an Open Source and fully remote company building developer tools for databases. · We are seeking an experienced SRE to manage the infrastructure of our Postgres databases. We currently manage over 1M Postgres instances and are growing fast. · You will: · Help build th ...


  • commercetools Remote, United States Full time

    commercetools - we are: · Engaged: We didn't become the fastest growing, highest ever valued SaaS software company in digital commerce with nearly 100% year-over-year growth by sitting on the sidelines. · Inspired: We continually explore what's possible. As the founder of the hea ...


  • Articulate Remote, United States Full time

    Articulate is looking for a Senior Site Reliability Engineer to join our amazing Platform Engineering team. The Senior Site Reliability Engineer I will be responsible for working cross-functionally to deliver and maintain scalable and reliable infrastructure. · What you'll do: · ...


  • Oscar Health Remote, United States Full time

    Hi, we're Oscar. We're hiring a Senior Site Reliability Engineer II, Infrastructure Metal to join our Engineering team. · Oscar is the first health insurance company built around a full stack technology platform and a focus on serving our members. We started Oscar in 2012 to crea ...