beBee background
Professionals
>
Technology / Internet
>
New York City
Benjamin Carsley

Benjamin Carsley

Data & Machine Learning Engineer
New York City, New York

Social


About Benjamin Carsley:

Motivated Linguist, Data Scientist, and ML Engineer with a strong background in applying their linguistics and data engineering toolkit to enormous textual, multilingual, or quantitative datasets. Experienced in building lighting-fast Databases, Data Retrieval & Analysis Tools, and containerized  ML infrastructure from the ground up. Passionate about contributing to open source development and leveraging public domain datasets (wayback internet archive, gutenberg, wikipedia, the Pile, the Stack, etc.) in their personal research. Strong communications background from years of research, writing, and study in the humanities — Classics, Comparative Literature, Translation Studies, History, all eventually leading to a degree in Linguistics. Experience applying analytic and data-driven problem- solving skills to institutional and systemic problems in a variety of work environments and tech-stacks — from legacy codebases, low/no-code environments, to authoring/developing cutting-edge, ground-up cloud infrastructure. Full-stack experience: from UI/UX design (html/css) to server (python), API (node/deno), compiler (C/Rust/Docker), to systems-level programming (CLIs/linux). Self-taught programmer and web developer, always learning more and always seeking to apply their skills to better public access to important information, forgotten archives, and other important repositories of public domain knowledge (e.g. OCR / CV digitization & organization of manuscript archives  & audio transcription of oral history/language documentation archives).

Experience

Presently : Data & ML Engineer

@ EcoMap Technologies

in Baltimore, MD // New York, NY (80% remote)

Responsibilities from June 2023 ⇒ Present :

  • Spearheaded two large ML projects/solo-hacks
    1. EcoBot ⇒ a conversational agent capable of accurately returning data from a knowledge base of entities, localized events, resources, etc. — i.e. custom datasets of publicly accessible data (which the company collects per a client’s stated interests) that are editable and can be added to in real time via ravenML-as-a-service (see below for ravenML). In production, EcoBot is a multi-instance Next.js server deployed to Vercel, which can be embedded in any host/client’s website, that dynamically renders a given client’s configurations (dataset, css selectors/colors, persona, static files, branding) from a central postgres SQL database. The project began as a proof-of-concept sprint from February-March of 2023, and the launch of the production server, admin panel (to enable real-time edits/additions), and first client configuration kicked off July 31st. Read more on my website
    2. ravenML ⇒ EcoMap, being fundamentally a datasets-as-a-service company, had been engaged in manual data methods for some time. The discovery of new data sources, compilation of entities/data points (or “assets”) for a given client, and finally, the “extraction” of full json records to be sent to an EcoMap platform — all of these processes were manual and required some time even for a team of 5 and a target asset count of ~500/client, on average. Until I began building ravenML with my team lead one afternoon for fun, no serious tool had been put forward to accelerate the Data Engine of EcoMaps. Begun in April, in production by July 15th, ravenML is a comprehensive solution — which I went ahead and implemented in three different languages/approaches — that is capable, given a set of valid urls and names of the target assets, of collecting all the descriptive, classifying, and contact data included in a given EcoMap client’s dataset – read more on my website
  • EcoBot has been a solid revenue generator, so naturally continual test-driven development has been a part of my daily responsibilities — small bug fixes, re-tuning model behavior, requested feature improvements, etc.
  • Likewise, ravenML has become the standard Data Engine of EcoMap Data, so the supervision/orchestration/scheduling of regular Data requests through ravenML, at least in this early stage of monitoring and continual upgrades as we learn more about the web content we are collecting at increasing scale, is a core part of the day-to-day in this role
  • Built a host of repositories and RESTful services in python/js/ts to support the increasing need for ML infrastructure at EcoMap to corral, organize, and collect more and more Data.
  • Fine-tuned a host of small txt-2-txt LLMs based on the llama-2 family of models, as well as one fine-tuning of Falcon-40B, all of which are capable of equaling gpt-3.5-turbo task-for-task in the LLM-pipelines I built — adding concurrency and throughput of these bulk/batch LLM inference projects (e.g. ravenML).
  • Extensive experience building and deploying task-specific NLP and ML models for zero-shot classification, language detection, image-classification, image-generation, text embeddings, text-generation, and even multimodal text-image QA — all of this research and work was completed using models from the HuggingFace Hub, SBERT, transformers, curated-transformers, scikit-learn, pyTorch, accelerate, trl, etc. — the open source LLM toolbox.
  • Acquired access to and deployed roughly a dozen services using the following proprietary/hosted model providers — OpenAI, Anthropic, Google (PaLM), Azure (OpenAI), Together AI.

 


 

Previously : Ecosystem Ontologist => Data & ML Engineer

EcoMap Technologies

in Baltimore, MD // New York, NY

from (November - June 2023)

  • Effectively I was hired in July 2022 as a consultant on the descriptive data, specifically the keywords and classifiers, being utilized at EcoMap Tech as part of their in-house Data process. This short-term contract inspired a full-time position as “Ecosystem Ontologist” continuing the semantic/linguistic improvements of the descriptive Data EcoMaps provides as the core value proposition to their clients.

     
  • The goal of the short term contract ( July - September) was to make targeted, linguistically-minded recommendations and assist in a top-down redesign of the classification system/ontology of their own data, especially since they were in a period of rapid client growth, which accelerated the diversification of the entities and semantic domains that comprised their database of assets.

     
  • After an audit of their database, organizational practices, SQL db schemas, data collection practices, etc. I realized it was necessary to employ tools larger than those I was formally equipped with in undergraduate by General Linguistic Theory. It was apparent a large-scale reorganization of the fundamental database architecture was required; it became necessary to think bigger-picture and develop the programmatic skills necessary to wrangle this out of control lexicon from the low-code/no-code tools that messily maintained it.

     
  • Essentially I went on a long sprint, learned these languages

    • Python, Node.js, Javascript (vanilla), Typescript, React.js/Next.js/Svelte/Tailwind, Deno (runtime), SQL/pgSQL/mySQL, HTML/CSS, Docker, Linux (Debian), Cloud Platform Frameworks / Client Libraries (GCP, AWS, Azure), Ruby, some Java, Scala, & R
  • …and then proceeded to assist the engineering team and mend the data architecture (on Bubble, Xano, ElasticSearch App Search); this marks the beginning of my transition from linguist-consultant to a role focused more on data-science and engineering driven tasks (December 2022)

     
  • Once “Project Raven” (or ravenML) was proposed by the Head of Engineering and myself in April, I formally took the position of Data Engineer, but naturally my work as an engineer had begun in earnest months earlier — in February, certainly, in December 2022 even, during my research as a linguistically-minded data scientist.

     

 


 

Previously : Library Assistant IV

C.V. Starr East Asian Library

in New York, NY (Columbia University)

from (March 2022 - November 2022)

  • Managed LOC & Dewey Catalog and maintained Voyager and Clio Database (CUNIX) systems.
  • Coordinated all ILL, ReCAP, BorrowDirect, Rapid, and Scan-Deliver requests, with re-organization of Japanese & Korean Literature sections, and the retrieval of special requests from the Rare Books and Manuscripts Archive.

Education

Bachelor of Arts in Linguistics, Columbia University, New York, NY

- Additional coursework in Comparative Literature & Society, Classics Department, Modern Hellenic Studies, and Department of Middle East, Asian, and African Studies.

- GPA: 4.02

Professionals in the same Technology / Internet sector as Benjamin Carsley

Professionals from different sectors near New York City, New York

Other users who are called Benjamin

Jobs near New York City, New York

  • The Judge Group

    Corporate Controller

    Found in: One Red Cent US C2 - 1 day ago


    The Judge Group New Brunswick, United States

    Our client is currently seeking a Corporate Controller for a CPG manufacturing company near Bernardsville, NJ area. POSITION: Corporate ControllerON SITE POSITIONSalary: $150,000-$170,000 + BonusResponsibilities: · Manages and tracks the performance by ensuring the integrity of p ...

  • Eliassen Group

    Senior Java Developer

    Found in: Appcast US C2 - 2 days ago


    Eliassen Group Jersey City, United States

    Our industry-leading financial services client is seeking a Java Developer to work in Jersey City, NJ Must be within commuting distance. · Due to client requirement, applicants must be willing and able to work on a w2 basis. For our w2 consultants, we offer a great benefits pack ...

  • beBee Handyman

    Home handyman for repair and/or small fixes service"

    Found in: Handyman CS US - 21 hours ago

    Direct apply

    beBee Handyman Hyde Park, PA, United States Freelance

    Hello. Could you please inform me about the availability of someone to fix a blind in Hyde Park, PA? I need it done today at any time or tomorrow morning. Thank you for your prompt response. Type of help needed from the handyman · Repair and/or minor fixes · How many items need t ...