Your Role
As an AI Engineer: Voice Designer, you’ll own the back-end implementation and linguistic optimization of the Text-to-Speech (TTS) layer for our next-generation AI voice agents. You’ll work squarely within our Speech Team—a high-impact R&D and engineering group focused on speech recognition, enhancement, and synthesis. You will bridge the gap between core speech science and product engineering, ensuring our voice agents sound human, context-aware, and trustworthy. You’ll also help create the systems that manage voice personas, tone, and conversational fillers, eventually exposing these as tweakable parameters to our customer-facing UI.
This position reports to our Senior Manager, AI Speech, is based at our Vancouver hub, and operates on a hybrid schedule.
What You’ll Do
- TTS Backend Implementation: Own the integration and optimization of multiple TTS vendor APIs while leading research and prototyping for open-source or in-house TTS architectures.
- Linguistic Optimization: Apply expertise in phonetics and sociolinguistics to ensure TTS input is formatted for maximum naturalness, including SSML orchestration and pronunciation handling.
- Conversational Turn Design: Craft context-specific utterances to optimize turn handling and build caller trust during agentic "thought" processes.
- Prompt & Persona Management: Design and manage LLM and TTS prompts and parameters to define and refine agent personalities across different industry verticals.
- UI Parameter Exposure: Architect the logic to expose voice attributes (speed, pitch, tone, style) to the product UI, allowing customers to customize their agent’s voice profile.
- Cross-Functional R&D: Partner with ASR and Audio AI engineers to ensure end-to-end voice quality and minimize latency in the ASR → LLM → TTS pipeline.
Skills You’ll Bring
- Technical Foundation: Strong Python programming skills and experience with deep learning frameworks (e.g. PyTorch).
- Speech Expertise: 3+ years of experience in Speech Synthesis (TTS) or Voice Design, including hands-on work with frameworks like NVIDIA NeMo, ESPnet, or Coqui, and hands-on experience with major TTS APIs such as ElevenLabs, Rime, and Cartesia.
- Linguistic Background: Degree in Computational Linguistics, Computer Science, or AI/ML with a deep understanding of phonetics, prosody, and syntax.
- Prompt Engineering: Proven experience crafting and evaluating LLM prompts (system, few-shot) and managing structured prompt templates.
- Backend Engineering: Experience building production-grade APIs and integrating multi-vendor services in a cloud environment (GCP preferred).
- Evaluation Mindset: Knowledge of speech quality metrics (MOS, intelligibility, latency) and the ability to design rigorous A/B tests for voice personas.