Data Crawling Engineer - Los Gatos, United States - Understanding Recruitment

    Understanding Recruitment
    Understanding Recruitment Los Gatos, United States

    1 month ago

    Default job background
    Description
    Join Our Team as a Data Crawling Engineer

    Are you an engineer passionate about leveraging AI to tackle real-world challenges? We're a leading player in the AI space, backed by prominent investors and driven by a commitment to innovation.

    Our team is dedicated to pushing the boundaries of what's possible with AI, and we're looking for motivated individuals to join us in our mission.

    About Us
    At our company, we're all about harnessing the power of AI to deepen the partnership between humans and computers. Our innovative approach spans the entire AI stack, from user experience to optimizing models for maximum efficiency.

    The Data Acquisition (Crawler) Engineer will be responsible for developing and maintaining the systems that allow for the smooth and efficient collection, storage, and processing of data from various sources.

    Your primary responsibility will be to design, develop, and maintain web crawlers and data acquisition systems in an efficient and reliable manner to support our model training.


    Responsibilities:
    Architecting and building a large-scale distributed web crawler system.

    Designing and implementing web crawlers and scrapers to automatically extract data from websites, addressing challenges such as dynamic content and scaling to handle large data volumes.

    Developing data acquisition pipelines to ingest, transform, and store substantial volumes of data efficiently.
    Creating a highly scalable system and optimizing crawler performance for maximum efficiency.
    Monitoring and troubleshooting crawler activities to detect and resolve issues promptly.
    Collaborating closely with the data infrastructure and data researcher to enhance the quality of the data.

    What We Are Looking For:
    Previous experience with large-scale web crawling is essential for this role.
    5 years of experience in data-intensive applications and distributed systems.
    Proficiency in high-performance programming languages such as Go, Rust, or C++.
    Strong understanding of orchestration and containerization frameworks like Docker and Kubernetes.
    Experience with building on GCP or AWS services.

    Keywords:
    Data Crawling, Web Crawling, AWS, C++, Docker, Kubernetes

    #J-18808-Ljbffr