Data Engineer - Washington, United States - Esolvit

    Esolvit
    Esolvit Washington, United States

    1 month ago

    Default job background
    Description
    Title: Data Engineer (LOCAL)

    Location: Washington, DC

    Duration: Long Term

    You may either create a user id and sign up under the arytic link or go in without signing up to access the job/open roles on our Arytic platform.

    Job Description:

    The Data Engineer provides the ETL support to the data science and software engineering team members. Build, modify, support infrastructure for optimal extraction, transformation, and loading of data from variety of structure, unstructured data sources and multi-terabyte distributed file system. Candidate will formulate and rapidly prototype various approaches as well as effectively communicate the pros and cons of each. Provide data-driven approaches to tackle various business problems. The candidate will have the ability to contribute to a high-performing, motivated workgroup by applying interpersonal and collaboration skills to achieve project goals Architect for ML data pipeline with data acquisition and preprocessing functionalities that gather data from heterogenous data pool from the distributed file system, unstructured text extracted from multi-million images of medical records with varied OCR quality, their metadata from relational databases and custom annotations.

    Responsibilities:
    ? Provide current system architecture documentation, engineering/web development programming support for program/project requirements defined tasks, data science/data engineering related technical assessments
    ? Manage/maintain structured, semi-structured, and unstructured data, structuring and wrangling data as appropriate for statistical analysis
    ? Implement data warehouse concepts and relational databases, big data management techniques and tools (e.g. Hadoop, MAPReduce)
    ? Communicate with technical and non-technical users and managers, and server administration, to include hardware and software support to existing servers.
    ? Provide software engineering support to operate, maintain and enhance systems that are integrated with and/or relied upon by the data engineering lifecycle
    ? Integrate, analyze, and visualize data and information in near real-time (within 24 hours) from multiple disparate data sources.
    ? Optimize data storage and access
    ? Proficiency with Python and Java, Oracle enterprise manager, SQL, AWS

    Qualifications:
    ? Masters degree in related field + 5 years experience; or PhD +1 year experience; or Bachelor's degree in related field + 7 years experience
    ? Minimum of 5 years experience conducting ETL tasks, performance engineering, run-time optimization, large data volume transfers
    ? Minimum 3 years experience with Regular Expressions, SQL (PostgreSQL), No-SQL (MongoDB)
    ? Minimum 1 year experience with Version control systems (Git)
    ? Preference to developer with experience working with healthcare data and Health IT
    Skills/Tools Utilized (at least 1-2 years exp in some of the following):
    ? Apache Hadoop (Cloudera)
    ? AWS Data Platforms (Redshift, S3, EMR/Hive)
    ? SQL
    ? Java
    ? Kafka
    ? Scala
    ? Kotlin
    ? Neo4j
    ? NiFi
    ? Flink
    ? Sqoop
    ? PostgreSQL
    ? EMR
    ? Apache Spark
    ? Python
    ? PHP
    ? Oracle
    ? Splunk
    ? BDD
    ? testing framework: Cucumber
    ? Knowledge of and experience using various NLP approaches, particularly:

    • Pattern recognition/feature extraction

    • Supervised, Unsupervised, and Semi-Supervised learning techniques

    • Understanding of various language models (N-Gram, Skipgram, NLM, etc.)

    • Chunking/Tokenization

    • Semantic parsing
    Skills highly desired:
    ? Healthcare IT experience
    ? Statistical model building (particularly classification)

    Required skills:
    • 10+ years of experience in MongoDB
    • 10+ years of experience in PostgreSQL.
    • 10+ years of experience in Cloud Application Architecture