No more applications are being accepted for this job

Sr Lead Data Engineer Apache Spark, Spark Core, Spark SQL, Spark Streaming – Databricks - Seattle, United States - The AES Group

The AES Group Seattle, United States

Found in: Appcast US C2 - 1 week ago

Retail

Description

Direct Client - Sr Lead Data Engineer Apache Spark, Spark Core, Spark SQL, Spark Streaming – Databricks

Location: Hybrid (Seattle, WA)

Length: 12+ Months

Job description

This position contributes to Client success by building enterprise data services for analytic solutions. This position is responsible for the design, development, testing, and support of data pipelines to enable continuous data processing for data exploration, data preparation, and real-time business analytics. Models and acts in accordance with Client guiding principles.

Tops 3 Skills Needed

Data Engineering - 5+ years
Data Bricks - 5+ years
Sparks/ DevOps practices - 5+ years

Years of Experience:

5+ years

Technology requirements:

Proficiency in Apache Spark, including Spark Core, Spark SQL and Spark Streaming.
Proficiency in languages such as Python for data processing and scripting.

Basic Qualifications/ Experience:

Experience in designing and implementing ETL processes using Databricks notebooks for efficient data extraction, transformation, and loading.
In-depth knowledge of the Databricks Unified Analytics Platform and its features for collaborative big data analytics.
Understanding of data modeling concepts for designing database structures.
Proficiency in working with both relational databases and NoSQL databases.
Integration of data from diverse sources, including APIs, databases, streaming data, and external data feeds.
Implementation of processes to ensure data quality, including data validation, cleansing, and error handling.
Knowledge of cluster management, optimization, and scaling for efficient data processing.
Optimization of Spark jobs and Databricks clusters for better performance.
Proficiency in cloud platforms such as Azure for building scalable and flexible data architectures.
Use of tools like Apache Airflow, ADF, or Databricks to orchestrate and schedule data workflows.

Degree or certifications required:

Education (minimum education level, degree, or certification necessary): Bachelor's degree in computer science, management information systems, or related discipline

Skills (minimum skills required):

5-7+ years of architecture and designing large-scale high-performance distributed systems
5-7+ years SQL Platform
2+ years Exposure No-SQL Platform is a plus
5+ years of Hadoop, YARN, MapReduce, Pig or Hive, Spark
2+ years of Data platform implementation on Azure or AWS is a plus

Key Responsibilities:

Responsibilities and essential job functions include but are not limited to the following:
Demonstrate deep knowledge and ability to lead others in the data engineering team to build and support non-interactive (batch, distributed) & real-time, highly available data, data pipeline, and technology capabilities
Translate strategic requirements into business requirements to ensure solutions meet business needs
Work with infrastructure provisioning & configuration tools to develop scripts to automate the deployment of physical and virtual environments; to develop tools to monitor the usage of virtual resources
Assist in the definition of architecture that ensures that solutions are built within a consistent framework
Lead resolution activities for complex data issues
Define & implement data retention policies and procedures
Define & implement data governance policies and procedures
Identify improvements in team coding standards and help in the implementation of the improvements
Leverage subject matter expertise to coordinate issue resolution efforts across peer support groups, technical support teams, and vendors
Develop and maintain documentation relating to all assigned systems and projects
Perform systems and applications performance characterization and trade-off studies through analysis and simulation
Perform root cause analysis to identify permanent resolutions to software or business process issues
Lead by example by demonstrating the Client's mission and value

Nice-to-Haves:

Knowledge of data security best practices and the implementation of measures to ensure data privacy and compliance.
Implementation of monitoring and logging solutions to track the health and performance of pipelines.
Familiarity with monitoring platforms like DataDog and New Relic
Azure

Sr Lead Data Engineer Apache Spark, Spark Core, Spark SQL, Spark Streaming – Databricks - Seattle, United States - The AES Group

Description

for Recruiters

Information