Bigdata / Pyspark Developer - Plano, United States - VirtusaPolaris - Virtusa Corporation

    Default job background
    Description
    Experience in building SparkStreaming process. Proficient in understanding distributed computing principles. Experience in managing Hadoop cluster with all services.
    Proficiency with Hadoop MapReduce HDFS Pig Hive and Impala. Experience with Nosql Databases and Messaging systems like Kafka. Designing building installing configuring and supporting Hadoop Perform analysis of vast data stores. Good understanding of cloud technology. Must have strong technical experience in Design Mapping specifications HLD LLD.

    Must have the ability to relate to both business and technical members of the team and possess excellent communication skills.

    Leverage internal tools and SDKs, utilize AWS services such as S3, Athena, and Glue, and integrate with our internal Archival Service Platform for efficient data purging.

    Lead the integration efforts with the internal Archival Service Platform for seamless data purging and lifecycle management. Collaborate with the data engineering team to continuously improve data integration pipelines, ensuring adaptability to evolving business needs.
    Performed database health checks and tuned the databases using Teradata Manager.
    Develop and maintain data platforms using Python
    Work with AWS and Big Data, design and implement data pipelines, and ensure data quality and integrity
    Collaborate with crossfunctional teams to understand data requirements and design solutions that meet business needs
    Implement and manage agents for monitoring, logging, and automation within AWS environments
    Handling migration from PySpark to AWS
    Experience in building SparkStreaming process. Proficient in understanding distributed computing principles. Experience in managing Hadoop cluster with all services.
    Proficiency with Hadoop MapReduce HDFS Pig Hive and Impala. Experience with Nosql Databases and Messaging systems like Kafka. Designing building installing configuring and supporting Hadoop Perform analysis of vast data stores. Good understanding of cloud technology. Must have strong technical experience in Design Mapping specifications HLD LLD.

    Must have the ability to relate to both business and technical members of the team and possess excellent communication skills.

    Leverage internal tools and SDKs, utilize AWS services such as S3, Athena, and Glue, and integrate with our internal Archival Service Platform for efficient data purging.

    Lead the integration efforts with the internal Archival Service Platform for seamless data purging and lifecycle management. Collaborate with the data engineering team to continuously improve data integration pipelines, ensuring adaptability to evolving business needs.
    Performed database health checks and tuned the databases using Teradata Manager.
    Develop and maintain data platforms using Python
    Work with AWS and Big Data, design and implement data pipelines, and ensure data quality and integrity
    Collaborate with crossfunctional teams to understand data requirements and design solutions that meet business needs
    Implement and manage agents for monitoring, logging, and automation within AWS environments
    Handling migration from PySpark to AWS

    #J-18808-Ljbffr