No more applications are being accepted for this job
- Developing Data Pipelines: Designing and implementing scalable data pipelines using Databricks on AWS infrastructure leveraging Python and Spark.
- Data Ingestion: Managing the ingestion of structured and unstructured data from various sources into Databricks clusters on AWS, ensuring data quality and reliability.
- Data Processing: Performing data processing tasks, such as data transformation, cleansing, aggregation, and enrichment using Databricks tools and AWS services like EMR (Elastic MapReduce).
- Optimization and Performance Tuning: Optimizing Databricks jobs and clusters for performance, efficiency, and cost-effectiveness on the AWS platform.
- Integration: Integrating Databricks with other AWS services such as S3, Glue, Redshift, and Lambda to build end-to-end data solutions.
- Monitoring and Maintenance: Monitoring Databricks clusters, jobs, and workflows for performance issues, troubleshooting errors, and ensuring high availability and reliability.
- Security and Compliance: Implementing security best practices and ensuring compliance with data privacy regulations (e.g., GDPR, CCPA) in Databricks deployments on AWS.
- Collaboration: Collaborating with data engineers, data scientists, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
- Documentation: Documenting Databricks workflows, architecture, and best practices to facilitate knowledge sharing and maintain system documentation.
- Continuous Learning: Keeping up-to-date with the latest advancements in Databricks, AWS services, and data engineering technologies to continuously improve skills and contribute to innovation in data solutions.