Software Engineer - San Jose, United States - eTeam

    eTeam background
    Description

    We are a deeply data-driven company with data driving not only business decisions but also product features and directions. Our team is responsible for supporting one of the biggest in-house Hadoop/Spark infrastructure in Silicon Valley. We build infrastructure and tools to power thousands of AI engineers, data analytics, and software engineers within the company. We are also committed to being an early adopter and contributor to open source big data technologies, including Hadoop, Spark, and TensorFlow. We are looking for a data engineer with a focus on Hadoop and Spark to develop our Spark infrastructure and grow our Spark user groups. Your work will impact thousands of Spark users inside and outside the company.

    Responsibilities:

    • Prepare training material and hold technical training sessions for internal Spark users
    • Developing Spark infrastructure, Spark user libraries and tooling to make direct impact to users
    • Interact with multiple groups on a daily basis to design data solutions and promote the best Spark practices among users
    • Solve production issues related to user Spark applications
    Basic Qualifications
    • BS/MS in Computer Science or related technical discipline
    • 1+ years of relevant work experience in Spark and Hadoop
    • Excellent communication and presentation skills
    • Good understanding of Spark, Hadoop, HDFS, YARN, Hive technologies
    • Good understanding of distributed storage and computing systems
    • Good understanding of programming language like Java and Scala
    • Good understanding of scripting language like Shell and Python
    Preferred qualifications
    • Good understanding in Spark Core and Spark SQL engine
    • Experience in building large scale applications with Spark SQL
    • Experience in writing, analyzing, and debugging SQL queries using Hive and Spark SQL
    • Experience with tuning large scale Spark jobs
    • Experience in working with a data format on Hadoop/Spark like Avro, Orc, and Parquet
    • Experience with Interactive Spark Notebooks (Jupyter Notebooks)