Data Scientist (AI Data & LLM Specialist) at Eclipse Labs
Company: Eclipse Labs
Location: Remote
Type: FULL_TIME
Job Description
<p>Join the core team at Eclipse, where we’re building an AI agent-first marketplace that connects intelligence with real-world tasks, starting with data collection and labeling. We are seeking a Data Scientist to establish the foundation for how our data is labeled, processed, and prepared for consumption by next-generation Large Language Models (LLMs). Your work will be critical in transforming our raw data collections into valuable, AI-ready datasets.</p>
<h2><strong>Qualifications&nbsp;</strong></h2>
<ul>
<li>Proven experience as a Data Scientist or Machine Learning Engineer with a focus on data quality and preparation.</li>
<li>Strong understanding of data labeling methodologies and hands-on experience with data annotation platforms and workflows.</li>
<li>Demonstrated experience preparing datasets for training and fine-tuning Large Language Models (LLMs), including knowledge of techniques like tokenization, embeddings, and NER.</li>
<li>Proficiency in Python and common data science libraries (e.g., Pandas, NumPy, Scikit-learn, spaCy, Hugging Face).</li>
<li>Experience using APIs/SDKs to automate data annotation and active learning loops.</li>
<li>Excellent communication skills, with an ability to create clear documentation for technical and non-technical audiences.</li>
</ul>
<h2><strong>Responsibilities&nbsp;</strong></h2>
<ul>
<li>Develop Data Labeling Strategies: Design and document a formal data annotation strategy, including clear, scalable, and efficient guidelines for labeling our data. Define and enforce quality metrics, including inter-annotator agreement.</li>
<li>Optimize for LLM Consumption: Research, define, and prototype the optimal data formats, structures, and pre-processing steps required for fine-tuning and training LLMs on our datasets.</li>
<li>Data Quality Analysis: Establish automated processes and metrics to analyze the quality of both raw and labeled data, providing feedback to improve our data collection and labeling workflows.</li>
<li>Collaborate with Engineering: Work closely with the engineering team to guide the implementation of data processing pipelines and ensure the data infrastructure meets the needs of ML applications.</li>
</ul>
<h2><strong>Nice-to-Haves</strong></h2>
<ul>
<li>Experience with audio data processing and relevant libraries.</li>
<li>Familiarity with data annotation platforms and tools.</li>
<li>Knowledge of modern MLOps principles and practices.</li>
<li>Experience with large language model data curation and Reinforcement Learning from Human Feedback (RLHF) pipelines.</li>
</ul>
<h2><strong>Join the Eclipse team!</strong></h2>
<p>Eclipse is building the fastest Ethereum
Browse More Jobs
Priority job-market routes
Explore exact-match crypto job pages with stronger market coverage, salary context, and fresh protocol hiring inventory.
- Base jobs — 720/mo exact-match demand for Base ecosystem hiring.
- Aptos jobs — 590/mo protocol demand backed by live inventory.
- Blockchain jobs — 880/mo head-term route for blockchain developer intent.
- Remote crypto jobs — 110/mo remote-intent shortcut with work-style relevance.
- Blockchain developer salary — 390/mo salary-intent surface with compensation proof.