Data Engineer - Parsippany, United States - Compunnel
Description
Primary SkillsAWS Glue, DynamoDB, S3 buckets, Python/Pyspark. unstructured data, building pipelines. All AWS
Must Have
5+ years of experience overall, strong in Pyspark/Python, creating data pipelines / unstructured data.
Good experience working with AWS and its services.
AWS Glue to ETL data out, build pipelines using Python and pyspark, data is dropped into AWS S3 bucket, and value of Dynamo DB to secure the data in the bucket.
Experience in building data ingestion pipelines for Structured and Unstructured data both for storage and optimal retrievalExperience working with Cloud data stores, NoSQL, Graph and Vector databases.
Experience working with Databricks and Snowflake technologies.
Experience with relevant code repository and project tools such as GitHub, JIRA, and Confluence
Working experience with Continuous Integration & Continuous Deployment with hands-on expertise on Jenkins, Terraform, Splunk and Dynatrace.
Highly innovative with aptitude for foresight, systems thinking and design thinking, with a bias towards simplifying processes.
Detail oriented individual with strong analytical, problem-solving, and organizational skills
Ability to clearly communicate to both technical and business teams.
Responsibilities
Build data ingestion framework and data pipelines to ingest unstructured and structured data from various data sources such as SharePoint, Confluence, Chat Bots, Jira, External Sites, etc.
Work closely with cross-functional teams, including product managers, data scientists and engineers to understand project requirements and objectives ensuring alignment with overall business goals.
Design a scalable target state architecture for data processing-based on document content (Data types may include, but are not limited to: XML, HTML, DOC, PDF, XLS, JPEG, TIFF, and PPT) including PII/CII handling, policy-based hierarchy rules and Metadata tagging.
Design, development, and deployment of optimal data pipelines including incremental data ingestion strategy by taking advantage of leading-edge technologies through experimentation and iterative refinement.
Conducting research to stay up to date with the latest advancements in generative AI services and identify opportunities to integrate them into our products and services.
Build automation that effectively and repeatably ensures quality, security, integrity, and maintainability of our solutions.
Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.
Define and implement data access policies; implement and maintain data security measures and access policies for cloud storage buckets and vector databases.
Education:
Bachelors Degree
#J-18808-Ljbffr