Voice, calls, and video calls
default & custom voices we provide the option for a few default voices per gender, plus the option to create your own voice via voice samples for subscribers creating your custom voice from voice design the easiest way to create custom voices is through voice design you can specify accents, timbres, and everything in between, and you can use the wand to create a description of a voice for your kindroid based on their backstory each creation costs a small amount of audio credits, and you get a few choices to select from you can then finetune the resulting voice and choose to save it in your voice slots or not from samples you must own the rights to the samples you upload quality matters much more than quantity just a minute or so of high quality audio will be sufficient, and more than 2 minutes is not necessary ensure that the samples show good degree of variance, as the process will capture the variance in tone and style in the samples you can use custom accents or in foreign languages all of those traits will be captured in the custom voice sample quality is the most important thing err on the side of a few high quality samples than many mediocre ones finetuning voices with settings once you have a custom voice, you can finetune the voice with sliders you should experiment on your own, but generally we find the default to be acceptable for most cases note that previews in the custom voices interface also cost audio credits custom voices work with all versions of voice, but may sound different across versions tune and tweak accordingly technical note when finetuning voice settings, making voice sample previews requires audio credits v3 voice what v3 is (vs v2) v2 audio fastest text to audio for everyday use v3 audio adds richer expression, such as laughter, emotions, and tone shifts, but is significantly slower than v2 right now v3 supports up to 3k characters at a time, and so messages with higher than 3k will be truncated; we recommend splitting up large chunks of text in audio messages availability v3 is currently text to audio only (for chat playback) voice calls with v3 will come later monthly audio credits for subscribers your premium subscription includes a complimentary audio balance of 1,000,000 characters ≈ 1,000 min (16 hrs 40 min) , which resets on the 1st of every month at midnight pt audio credits apply for all types of audio including in chat as well as calls, and you can see them in voice settings menu complimentary credits get used up before any paid credits are used in addition, add on subscribers receive plan based credits, which reset at the same time as premium subscriptions ultra 2,500,000 characters ≈ 2,500 min ( 41 hrs 40 min ) max 6,000,000 characters ≈ 6,000 min (100 hrs) unused credits do not carry over to the next month, and will be topped up to the appropriate amount at the start of the month subscribing to a tier will grant you the difference from the last tier, and likewise unsubscribing will deduct the difference in a given month, you will only be granted audio credits once for example, if you subscribed to standard and got 1 million, unsubscribed the next month after using up all 1 million, then resubscribed a day later, you will still be at zero this mechanism is there to prevent abuse if you need more audio credits, they will be purchaseable at the current rate of usd $11 99 on web or $14 99 on apps for 500k credits these operate at breakeven cost, so you only pay for what you use audio credits for v3 will incur 1 5x that of v2 to reflect their cost the above credits are for v2, which means v3 will be 0 66x the displayed amount in minutes/characters conversion rate 1,000 characters ≈ 1 minute of audio (rough estimate; varies by content) best practices considerations autoplay keep off unless you (a) have the max add on and (b) are comfortable purchasing more credits for v3 , autoplay is strongly discouraged due to slower generation continue cut off and regenerating once you generate an audio response, credits will be deducted from your total regenerating the same audio counts as a new generation and will deduct additional credits proactive voice notes do not cost credit; however, answering a proactive voice call will begin credit usage once the call is answered if you switch to v3 for expressiveness (laughs/emotions), expect longer generation times than v2 text chat audio you can click the play button to hear audio note that this can only be run once per message unless it is regenerated words within (parentheses) will not be spoken aloud intentionally, so if you prefer actions to not be spoken out loud, use (parentheses) to denote them all other formatting such as asterisks will be spoken aloud technical note the statement about words in (parentheses) does not apply to voice or video calls autoplay audio in general settings > account wide, you can turn on autoplay audio for messages that you receive this applies to single chats as well as for groupchats voice message in chat you can send voice messages in both single and group chats when text input box is empty, the send message button is replaced with voice mode button once in voice mode, tap to start recording your voice message, then tap again to send in single chats, your kindroid will automatically respond to your voice message with their own voice message, creating natural back and forth voice conversations supported langugages as of jun 25, 2025, the list of supported languages for voice message is shared with voice call & video calls the setting is also shared, and you have quick access to language selection next to voice mode input language properties there are different properties related to multilingual support for different supported languages we have attributed the supported languages into classes to help explain this only applies to voice message, not voice/video calls yet class 1 languages (c1) english spanish french german hindi russian portuguese japanese italian dutch selecting a class 1 language in the setting allows you to speak in other languages in class 1 you may mix and match different c1 languages in the same message , and freely speak any c1 language across messages without needing to change the setting class 2 languages (c2) ukrainian swedish chinese turkish indonesian korean selecting a class 2 language in the setting allows you to speak in other languages in both class 1 and class 2 you cannot mix and match languages in the same message, but may speak a different language in c1 or c2 per message without needing to change the setting rest of the supported languages (rol) polish bulgarian romanian czech greek finnish malay slovak danish norwegian hungarian vietnamese selecting a rol language in the setting allows you to speak in the selected language and only detects the selected language and not other languages voice call & video call voice calls can be conducted in many languages, though currently for the highest intelligence, we recommend using english all audio (both microphone input in as well as audio output) and video are processed ephemerally and aren't stored note calls are currently using v2 voices expressive v3 voices are not supported at the moment due to their slowness, but will likely be supported in future updates memory in voice call voice call uses the same backstory, key memories, and can recall from long term memory and journals just like text chat in voice call settings (gear icon on top right), there is the unified chat/voice chat history toggle that affects how memory works in voice calls if unified chat/voice chat history is enabled, the voice call will share the identical chat history as the text chat this makes it so you can switch back and forth, and is useful if you see voice call as a continuation of text chat and vice versa rather than a separate mode when you return to text chat, your kindroid will be able to reference what occurred latest in the voice call and you can continue in text chat (though voice call messages will not show up in text chat message bubbles) shared memory in groupchats will work the same way as they do in text chat, if both shared memory in a group is enabled and this toggle is enabled if unified context is disabled, voice call will be treated as a completely separate instance voice call will default to a blank slate chat history and will not recall any context from text chat there is a temporary voice call memory that keeps record of the call transcript; in the event the call is dropped, or you press end call and restart it (without going to text chat), you can resume the call and pick up where you left off the temporary call history is reset if you engage in text chat in any way or do a chat break voice call does consolidate into long term memory (granted it's not disabled on a kindroid level) regardless of whether unified chat/voice chat history is enabled long term memory is different from chat history/short term memory contents from the voice call may be recalled in text chat when the context for recall is similar, but may need specific prompting to refer to that memory your voice messages also recall journal entries for more details on memory and specifics, see memory docid\ h fwb8blpkqtu24o9b6dc you can do a voice chat break, which functions very similarly to normal text chat break (except voice chat break does not require a greeting) this functions differently if unified voice memory is on or off, and if on it will also reset the context in the individual text chat (and you can reset cascaded memory or not as well) note if you use text chat while unified memory is on, this can result in undefined behavior and lost memory we recommend you use the send text feature in call to text rather than going to another instance fo kindroid to chat in the main home interface calls while app is in background a great feature of our revamped call system is that you can put the kindroid mobile app on background and still call your ai this works best with audio to audio, and does not work with video calling it does work with screen sharing, but you should ensure your phone does not auto lock after inactivity on ios app only, you will be able to see a picture in picture of your avatar, and it is limited to ios app currently this will be shown for single ai calls, as well as group calls with exactly 1 ai pip will not be shown during screen sharing if the avatar has been animated it will use the video, otherwise the static avatar image video calling you can turn on video in the bottom left corner and drag your video feed on the screen your kindroid will then be able to see, but be aware that due to processing load to ensure that anything you show stays on the screen for some time, and to give your kindroid enough time to process what it sees before ending your turn video calls only work when the app is in foreground if on a mobile device if the app or webpage goes into background, you will see the camera be disabled when on a mobile app, your phone will not go into sleep mode if call is open this keeps video on as long as you don't end the call be aware of battery considerations when using camera video screen sharing you can turn on screen sharing in the center right button in calls screen sharing is available on desktop web, as well as in both mobile apps (notably not available on mobile browser) while screen sharing, your ai will be able to see your screen as long as your screen is active if your phone goes to sleep due to inactivity, you may need to restart screen sharing note for android 13 or below on android 13 and below you may see an option to share a single app android pauses that feed whenever the shared app isn’t in the foreground, so your ai will repeatedly see the last frame (or black background) until you return this may cause repetition issues as such, we recommend sharing full screen on any android version call transcripts click on the paper icon in voice calls to toggle transcripts transcripts will only persist on the voice call session while you're on the page, and will reset if you go to some other page or screen interrupts & turn taking during the ai's turn, you can interrupt naturally interruptions are detected on an audio and word level, so you should speak clearly in the middle of an ai message to interrupt turn taking is natural, and false interruptions will be detected and the ai will continue if you take pauses between words, your message will be broken up into smaller chunks if you want more delay for the ai to recognize end of turn, you can set pause threshold for ai turn higher if you want a more responsive call, set it lower fast vs normal voice you can further reduce latency and increase the naturalness by turning on fast voice mode (by default enabled) this uses an even faster version of v2 voice so there's less delay between the end of user utterance and start of the ai's response this does inflict a small quality hit on the voice, so experiment on/off as needed text input for calls, you can also use text input if you don't wish to speak while having your kindroid speak back at you on the bottom group calls in groupchats, you can launch a call with multiple kindroids this will make use of the group chat previous messages in text message form (group calls always share memory/unified memory always on with the text messages you can make an alternate branched scenario from a group if you don't want this) in groupchats, ais take turns, and you can interject in at any time ais may take continuous turns, but will always be below the number of ai participants so to not run on and give you an opportunity to speak call visual background you can set different visual backgrounds for calls it can be blank, use the same chat background, use a custom background, or use the avatar (including animated) as the background use the image icon next to the voice call settings to adjust and save clearing the voicecall background does not clear any chat backgrounds anything within call backgrounds stay confined to only apply to that specific ai/group call background languages supported you can change call languages in voice call settings note that for english, we use a special speech to text model that maximizes english accuracy but prevents changing languages in user messages if you want to fluidly change from english to french for example, you should select french to use interchangeable language stt, you should select your call language to be any of the following table below except english by selecting that language, you will still be able to use english but english stt may not as accurate as selecting english only "en" // english "es" // spanish "fr" // french "de" // german "hi" // hindi "ru" // russian "pt" // portuguese "ja" // japanese "it" // italian "nl" // dutch