Job Description
Join Epoch 2026, a trailblazing AI research lab redefining the boundaries of generative intelligence. We are not just building the future; we are architecting the cognitive infrastructure for tomorrow. As a Principal AI Engineer, you will lead the development of next-generation large language models and autonomous agents that power the enterprise of 2026 and beyond.
We are looking for visionaries who thrive in ambiguity and possess an unwavering commitment to technical excellence. If you are passionate about ethical AI, scalable architecture, and creating systems that think, Epoch 2026 is your next home.
Why join us?
- Work on cutting-edge LLMs and multimodal systems.
- Competitive equity package and top-tier benefits.
- Flexible remote-first culture with HQ in the heart of San Francisco.
- Access to the world's most powerful compute resources.
Responsibilities
- Lead the research and engineering of proprietary large language models (LLMs) and transformer architectures.
- Design and implement scalable training pipelines and fine-tuning strategies for production deployment.
- Collaborate with cross-functional teams of data scientists, product managers, and security experts to align AI capabilities with business goals.
- Mentor junior engineers and researchers, fostering a culture of innovation and continuous learning.
- Ensure the robustness, safety, and ethical deployment of AI systems.
- Optimize model inference latency and cost-efficiency in cloud environments.
- Contribute to open-source AI communities and publish high-impact research papers.
Qualifications
- PhD or Master's degree in Computer Science, Mathematics, or a related field, or equivalent practical experience.
- 8+ years of experience in machine learning, deep learning, or AI research.
- Deep expertise in PyTorch, TensorFlow, or JAX.
- Proven track record of publishing in top-tier conferences (NeurIPS, ICML, ACL) or delivering high-impact production AI products.
- Strong programming skills in Python and C++.
- Experience with distributed training frameworks (Ray, Spark) and cloud infrastructure (AWS, GCP, Azure).
- Excellent communication skills and the ability to translate complex technical concepts for diverse audiences.