Conversational Modelling Research Engineer

Tavus • Full-time • Remote (United States) • 1m ago

Why This Job is Featured on The SaaS Jobs

This role stands out in the SaaS landscape because it sits at the point where research-grade multimodal AI becomes a product capability. Tavus is presented as a Series B company building real-time “AI Humans,” which implies meaningful constraints that matter in SaaS delivery, including latency, reliability, and repeatable performance across many user interactions. The remit spans conversational avatars and audio visual foundation models, a domain where product differentiation is often determined by model behavior in live, user-facing contexts rather than offline benchmarks.

For a SaaS career, the notable value is the end-to-end exposure from experimentation to production. The listing explicitly references partnering with an applied ML team, which signals experience with the handoff from prototypes into deployed systems, a common inflection point in AI-native SaaS. Work on fine-tuning, conditioning, and controllability also builds transferable skills for teams shipping model-driven features that must be steerable, safe, and measurable over time.

The role is best suited to professionals who enjoy operating between research and engineering, with comfort iterating on ideas while staying grounded in implementation detail. It also fits candidates motivated by applied multimodal work where success is defined by real-time interaction quality and operational constraints, not only publications.

The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.

Job Description

About Us

Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms.

Imagine a friend who can discuss any topic with you. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale.

We’re a Series B company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role

We’re looking for an AI Researcher to join our core AI team and push the boundaries of Foundation Multimodal Conversational Models. If you thrive in fast-moving startup environments, enjoy experimenting with new ideas, and love seeing your work come to life in production then you’ll feel right at home.

Your Mission 🚀

Conduct research on Large Multimodal Models in the context of Conversational Avatars (e.g. Neural Avatars, Talking-Heads).
Develop methods to model both verbal and non-verbal aspects of conversation, adapting and controlling avatar behavior in real time, with low-latency.
Experiment with fine-tuning, adaptation, and conditioning techniques to make AudioVisual Multimodal Models, more expressive, controllable, and task-specific.
Partner with the Applied ML team to take research from prototype to production.
Stay up to date with cutting-edge advancements — and help define what comes next.

You’ll Be Great At This If You Have:

A PhD (or near completion) in a relevant field, or equivalent research experience.
Hands-on experience with Large Multimodal Models and a strong foundation in generative (language) models. This could be in the context of tasks such as VQA, Audio/Video understanding tasks, captioning behavioral analysis, Translation tasks, Speech to Speech systems.
Experience in fine-tuning/adapting VLMs for control, conditioning, or downstream tasks.
Solid background in deep learning and foundation modes.
Strong PyTorch skills and comfort building deep learning pipelines.

Nice-to-Haves

Knowledge of large-scale model training and optimization.
Experience in duplex-conversational model.
Broader understanding of generative AI across modalities.
Exposure to software development best practices.
A flexible, experimental mindset i.e. comfortable working across research and engineering.
(Bonus) Publications at EMNLP, COLING, NeurIPS, ICLR, CVPR, ICCV.

Location

Preferred: San Francisco (hybrid) or London (office opening soon).

Remote within the U.S. or Europe available for exceptional candidates.