A well-funded, research-driven AI company is building advanced real-time multimodal video models that power expressive, human-centric digital characters. The tech is complex: large-scale diffusion models, multi-GPU training, distillation, real-time optimization.
The Role:
This is an applied research + engineering role: you’ll work on training runs, data, model optimization, and the “make it fast” path that turns a capable research model into a real‑time experience.
What You’ll Do:
Train and scale video generation models: run large‑scale training/fine‑tuning on multi‑GPU (and when needed multi‑node) setups; own the training loop, stability, checkpoints, and iteration speed.
Own data for video modeling: build and improve video datasets/pipelines (decode/sampling, filtering/quality, conditioning alignment, storage formats), and keep the pipeline fast and reliable at scale.
Distill and compress big models into fast ones: teacher → student distillation, step reduction, architectural simplifications, and quality/speed trade‑offs to hit real‑time constraints.
Make models run in real time: profiling, memory optimizations, quantization-aware tactics where appropriate, kernel/runtime improvements, and practical throughput/latency wins.
Build the bridge to product: package models into simple inference APIs and prototypes; collaborate with product to turn research progress into user-facing experiences (interactive characters, conversational video).
Evaluate what matters: set up evaluation harnesses that track perceptual quality + temporal consistency + identity/character fidelity + latency/cost.
What You’ll Bring:
2+ years building and shipping ML systems (or equivalent), with clear ownership and delivery.
Strong PyTorch + Python, comfortable touching both training and inference code.
Hands‑on experience training or scaling generative models, ideally video generation (diffusion/transformers/VAEs or similar), not just using pre‑trained checkpoints.
Experience with distributed training and large runs (e.g., DDP/FSDP/DeepSpeed‑style workflows), and the practical debugging that comes with them.
Proven ability to improve performance in practice: latency/memory/cost optimizations, profiling, and shipping measurable wins.
Product mindset: can move from research ideas → robust implementation → iterating against real constraints.