Member of Technical Staff, Data Curation

Inception
Palo Alto, CA
Category Engineering
Job Description
We are seeking experienced engineers and scientists to shape how we collect, process, and curate the datasets that power our AI models. This interdisciplinary role combines engineering expertise with research insights to build scalable data pipelines, develop synthetic data generation techniques, and ensure our models are trained on high-quality, diverse datasets.

Requirements

  • BS/MS/PhD in Computer Science, Machine Learning, or related field (or equivalent experience)
  • 3+ years of experience building data processing pipelines at scale, particularly with AI/ML applications
  • Strong proficiency in Python and experience with data processing frameworks (Apache Spark, Beam, Airflow)
  • Familiarity with synthetic data generation techniques and data augmentation strategies
  • Familiarity with web scraping, crawling technologies, and Common Crawl datasets
  • Solid understanding of machine learning fundamentals and experience with ML frameworks (PyTorch, TensorFlow)
  • Experience with SQL and NoSQL databases for managing structured and unstructured data

Benefits

  • Competitive salary
  • Equity in a rapidly growing startup
  • Access to the latest GPU hardware and cloud resources
  • Flexible vacation and paid time off (PTO)
  • Health, dental, and vision insurance
  • A collaborative and inclusive culture
]]>