Data Engineer, Translational Bioinformatics - Trenton, United States - Invenex Search

    Invenex Search
    Invenex Search Trenton, United States

    1 month ago

    Default job background
    Description


    Invenex is proud to be partnered with a global oncology biopharma client, helping them to expand their Data Engineering team.


    This individual will contribute to the mission of the global data engineering function and be responsible for many aspects of data including creation of data-as-a-product, architecture, access, classification, standards, integration, and pipelines.

    Responsibilities

    Design, implement and manage ETL data pipelines that process and transform vast amounts of scientific data from public, internal and partner sources into various repositories on a cloud platform (AWS)
    Incorporate bioinformatic tools and libraries to the processing pipelines for omics assays such as DNASeq, RNASeq, or proteomics
    Enhance end-to-end workflows with automation that rapidly accelerate data flow with pipeline management tools such as Step Functions, Airflow, or Databricks Workflows
    Implement and maintain bespoke databases for scientific data (RWE, in-house labs, CRO data) and consumption by analysis applications and AI products

    What do we need from you?

    BS/MS in Computer Science, Bioinformatics, or a related field with 5+ years of software engineering experience (8+ years for senior role) or a PhD in Computer Science, Bioinformatics or a related field and 2+ years of software engineering experience (5+ years for senior role)
    Excellent skills and deep knowledge of ETL pipeline, automation and workflow managements tools such as Airflow, AWS Glue, Amazon Kinesis, AWS Step Functions, and CI/CD is a must

    Excellent skills and deep knowledge in Python, Pythonic design and object-oriented programming is a must, including common Python libraries such as pandas.

    Experience with R a plus
    Excellent understanding of different bioinformatics tools and databases such as STAR, HISAT2, DESeq2, Seurat and how they're used on different types of genomic and transcriptomic data such as single cell transcriptomics

    Interested in learning more?

    Apply directly or reach out to to discuss further

    #J-18808-Ljbffr