Why This Job is Featured on The SaaS Jobs
This AI Engineer role sits at a SaaS intersection where product differentiation increasingly depends on applied speech technology. The remit spans the Text to Speech layer of AI voice agents and the orchestration of an end to end ASR to LLM to TTS pipeline, signalling a productised AI capability rather than a research only mandate. The need to integrate multiple vendor APIs alongside prototyping open source or in house approaches reflects a pragmatic SaaS environment balancing time to value with longer term platform control.
For a SaaS career, the standout value is exposure to the mechanics of turning model behaviour into configurable product features. Work such as parameterising voice attributes for a customer facing UI, managing prompt and persona systems, and running structured evaluations aligns with how modern SaaS teams ship AI safely and iteratively. Experience with latency, quality metrics, and multi vendor reliability also transfers well to other AI enabled SaaS products where operational constraints shape user experience.
This position is best suited to professionals who enjoy bridging disciplines, combining backend engineering rigour with linguistic sensitivity and experimentation. It will appeal to candidates who prefer measurable iteration, clear ownership of a subsystem, and close collaboration across specialised AI functions. The hybrid setup in Kitchener also points to a role that benefits from regular in person technical exchange while still supporting focused build time.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
Your Role
As an AI Engineer: Voice Designer, you’ll own the back-end implementation and linguistic optimization of the Text-to-Speech (TTS) layer for our next-generation AI voice agents. You’ll work squarely within our Speech Team—a high-impact R&D and engineering group focused on speech recognition, enhancement, and synthesis. You will bridge the gap between core speech science and product engineering, ensuring our voice agents sound human, context-aware, and trustworthy. You’ll also help create the systems that manage voice personas, tone, and conversational fillers, eventually exposing these as tweakable parameters to our customer-facing UI.
This position reports to our Senior Manager, AI Speech, is based at our Kitchener hub, and operates on a hybrid schedule.
What You’ll Do
- TTS Backend Implementation: Own the integration and optimization of multiple TTS vendor APIs while leading research and prototyping for open-source or in-house TTS architectures.
- Linguistic Optimization: Apply expertise in phonetics and sociolinguistics to ensure TTS input is formatted for maximum naturalness, including SSML orchestration and pronunciation handling.
- Conversational Turn Design: Craft context-specific utterances to optimize turn handling and build caller trust during agentic "thought" processes.
- Prompt & Persona Management: Design and manage LLM and TTS prompts and parameters to define and refine agent personalities across different industry verticals.
- UI Parameter Exposure: Architect the logic to expose voice attributes (speed, pitch, tone, style) to the product UI, allowing customers to customize their agent’s voice profile.
- Cross-Functional R&D: Partner with ASR and Audio AI engineers to ensure end-to-end voice quality and minimize latency in the ASR → LLM → TTS pipeline.
Skills You’ll Bring
- Technical Foundation: Strong Python programming skills and experience with deep learning frameworks (e.g. PyTorch).
- Speech Expertise: 3+ years of experience in Speech Synthesis (TTS) or Voice Design, including hands-on work with frameworks like NVIDIA NeMo, ESPnet, or Coqui, and hands-on experience with major TTS APIs such as ElevenLabs, Rime, and Cartesia.
- Linguistic Background: Degree in Computational Linguistics, Computer Science, or AI/ML with a deep understanding of phonetics, prosody, and syntax.
- Prompt Engineering: Proven experience crafting and evaluating LLM prompts (system, few-shot) and managing structured prompt templates.
- Backend Engineering: Experience building production-grade APIs and integrating multi-vendor services in a cloud environment (GCP preferred).
- Evaluation Mindset: Knowledge of speech quality metrics (MOS, intelligibility, latency) and the ability to design rigorous A/B tests for voice personas.