Senior Software Engineer, Data - Los Angeles, United States - Fathom

    Default job background
    Description
    Fathom is on a mission to understand and structure the world's medical data. We are a venture-backed company with an engineering team out of organizations like Google, Facebook, and Twitch.

    We are starting by using NLP deep learning to structure the data contained within physician notes to automate medical coding, a process performed by 125,000 FTEs that costs the US healthcare system over $15B annually.

    We are looking for a

    Senior Software Engineer, Data

    to join our team and work on data products that drive the core of our business.

    We want to work with remote teammates who are excited about learning how to build and support machine learning pipelines that scale not just computationally, but in ways that are flexible, iterative, and geared for collaboration.

    If you are a data expert able to unify data, and build systems that scale from both an operational and an organizational perspective, Fathom is an interesting opportunity to explore.

    Your role and responsibilities will include:

    Developing data infrastructure to ingest, sanitize and normalize a broad range of medical data, such as electronics health records, journals, established medical ontologies, crowd-sourced labelling and other human inputs

    Building performant and expressive interfaces to the data

    Creating infrastructure to help us not only scale up data ingest, but large-scale cloud-based machine learning

    We are looking for a teammate with:

    5+ years of development experience in a company/production setting

    Experience building data pipelines from disparate sources

    Hands-on experience building and scaling up compute clusters

    A solid understanding of databases and large-scale data processing frameworks like Hadoop or Spark and the ability to evaluate which tools to use on the job

    A unique combination of creative and analytic skills apt of designing a system capable of pulling together, training, and testing dozens of data sources under a unified ontology

    Bonus points if you have:
    Know-how of developing systems to do or support machine learning, including experience working with NLP toolkits like Stanford CoreNLP, OpenNLP, and/or Python's NLTK

    Expertise with wrangling healthcare data and/or HIPAA

    Experience with managing large-scale data labelling and acquisition, through tools such as through Amazon Turk or DeepDive

    #J-18808-Ljbffr