Skip to main content
Rikka ships with built-in support for both Text-to-Speech (TTS) — hearing AI responses read aloud — and Automatic Speech Recognition (ASR) — dictating your messages with a microphone instead of typing. You configure each capability independently, choosing from several cloud and on-device providers. Once set up, TTS and ASR work directly from the chat screen without leaving the conversation. Navigate to Settings → Speech to access both tabs.

Setting Up TTS

Open the TTS tab inside Settings → Speech. Tap Add Provider and select the provider you want to use. Fill in the required fields, then tap Save. You can add multiple providers and switch between them at any time.To hear a message read aloud, tap the speaker icon that appears on any assistant message bubble.

Available TTS Providers

Uses OpenAI’s neural TTS API. Produces natural-sounding speech with a choice of six preset voices.
apiKey
string
required
Your OpenAI API key. Find this in your OpenAI dashboard.
baseUrl
string
default:"https://api.openai.com/v1"
Base URL for the OpenAI-compatible API. Change this if you are routing through a proxy or using a third-party OpenAI-compatible endpoint.
model
string
default:"gpt-4o-mini-tts"
The TTS model to use. gpt-4o-mini-tts is fast and cost-effective; swap in tts-1-hd for higher fidelity output.
voice
string
default:"alloy"
Voice preset. OpenAI offers alloy, echo, fable, onyx, nova, and shimmer.
Uses Google’s Gemini multimodal TTS. Supports a wide range of voices including the expressive Gemini 2.5 family.
apiKey
string
required
Your Google AI Studio API key.
baseUrl
string
Base URL for the Gemini API.
model
string
default:"gemini-2.5-flash-preview-tts"
Gemini model to use for speech synthesis.
voiceName
string
default:"Kore"
Voice name. Refer to the Google AI voice list for all available options.
High-quality Chinese and multilingual TTS from MiniMax.
apiKey
string
required
Your MiniMax API key.
baseUrl
string
default:"https://api.minimaxi.com/v1"
MiniMax API base URL.
model
string
default:"speech-2.6-turbo"
TTS model identifier.
voiceId
string
default:"female-shaonv"
Voice ID. Browse available voices in the MiniMax console.
emotion
string
default:"calm"
Emotional tone of the synthesized voice, e.g. calm, happy, sad.
speed
float
default:"1.0"
Playback speed multiplier. Values below 1.0 slow speech down; values above 1.0 speed it up.
Alibaba DashScope’s Qwen TTS, optimised for Chinese and multilingual speech.
apiKey
string
required
Your DashScope API key.
baseUrl
string
default:"https://dashscope.aliyuncs.com/api/v1"
DashScope API base URL.
model
string
default:"qwen3-tts-flash"
TTS model. qwen3-tts-flash is the low-latency variant.
voice
string
default:"Cherry"
Voice name. Check the DashScope documentation for the full list.
languageType
string
default:"Auto"
Language hint. Auto lets the model detect the language automatically.
Fast TTS via Groq’s inference infrastructure, powered by Orpheus models.
apiKey
string
required
Your Groq API key.
baseUrl
string
default:"https://api.groq.com/openai/v1"
Groq API base URL.
model
string
default:"canopylabs/orpheus-v1-english"
Orpheus TTS model to use.
voice
string
default:"austin"
Voice preset name.
Text-to-speech via xAI’s Grok API.
apiKey
string
required
Your xAI API key.
baseUrl
string
default:"https://api.x.ai/v1"
xAI API base URL.
voiceId
string
default:"eve"
Voice identifier.
language
string
default:"auto"
BCP-47 language tag or auto for automatic detection.
TTS provided by Xiaomi’s MiMo service.
apiKey
string
required
Your MiMo API key.
baseUrl
string
default:"https://api.xiaomimimo.com/v1"
MiMo API base URL.
model
string
default:"mimo-v2-tts"
TTS model identifier.
voice
string
default:"mimo_default"
Voice preset.
Uses Android’s built-in text-to-speech engine. No API key required — works fully on-device using whatever voice packs you have installed in Android settings.
speechRate
float
default:"1.0"
Speed of synthesized speech. 1.0 is normal speed; increase for faster delivery, decrease for slower.
pitch
float
default:"1.0"
Pitch of the synthesized voice. 1.0 is the default pitch for the selected Android voice.
The quality and language support of System TTS depend on the voice packs installed on your device. You can install additional voices in Android Settings → Accessibility → Text-to-speech output.

Using Speech in the Chat Screen

Once you have at least one provider configured in each category, the controls appear directly in the chat interface:

Microphone (ASR)

Tap the microphone icon in the chat input bar to start recording. Speak your message naturally — Rikka streams the audio to your ASR provider and populates the text field as words are transcribed. Release or tap again to stop.

Speaker (TTS)

Tap the speaker icon on any assistant message to have it read aloud by your configured TTS provider. Tap again to stop playback mid-sentence.
If you primarily use voice input, combine ASR with a low-latency TTS provider so the full conversation flows naturally without switching away from the keyboard.