AI Transcription
Soku includes built-in AI transcription powered by OpenAI Whisper. You can transcribe the audio from any video or audio file in your Media Library and use the resulting text for captions, subtitles, repurposing into text posts, or any other purpose.How It Works
- Upload a video to your Media Library (or select an existing one).
- Open the asset and click Transcribe.
- Select the language of the audio (optional but recommended for accuracy).
- Soku sends the audio to OpenAI Whisper for processing.
- The transcript text is returned and saved to the asset.
Credit Cost
Each transcription costs 1 credit, regardless of the length of the audio. This is a flat rate — a 30-second clip costs the same as a 30-minute video.Credits are included with your subscription plan and renew monthly. See Credits System for details on how credits work.
Language Selection
When starting a transcription, you can specify the language of the audio using an ISO 639-1 language code (for example,en for English, es for Spanish, ja for Japanese).
Specifying the correct language improves transcription accuracy, especially for non-English content. If you do not specify a language, Whisper will attempt to auto-detect it.
| Language Code | Language |
|---|---|
en | English |
es | Spanish |
fr | French |
de | German |
pt | Portuguese |
ja | Japanese |
ko | Korean |
zh | Chinese |
What You Get Back
A completed transcription returns:| Field | Description |
|---|---|
| Transcript text | The full text transcription of the audio. |
| Likely lyrics detection | A likelyLyrics flag indicating whether the audio appears to contain song lyrics rather than speech. |
likelyLyrics detection helps you decide how to use the transcript. For example, if the audio is a music track, you may want to use the transcript differently than if it were spoken narration.
Using Transcripts
Once a transcription is complete, you can:- Generate AI captions — Use the transcript as input for Soku’s AI caption generator to create platform-ready captions.
- Copy the text — Copy the transcript to your clipboard for use anywhere.
- Repurpose content — Turn video content into text posts, blog excerpts, or newsletter content.
Idempotency
If you are using the API to trigger transcriptions programmatically, you can include anIdempotency-Key header with your request. This ensures that if the same request is sent more than once (for example, due to a network retry), the transcription is only performed and charged once.
Idempotency keys are recommended when integrating transcription into automated workflows to prevent duplicate charges. See the Transcription API documentation for details.
Troubleshooting
| Problem | Solution |
|---|---|
| Transcription returns empty text | The audio track may be silent or too quiet. Check that your video has audible audio. |
| Transcription is inaccurate | Try specifying the correct language code. Auto-detection works well for common languages but may struggle with less common ones or heavy accents. |
| ”Insufficient credits” error | You have run out of credits. Check your remaining balance in Settings or upgrade your plan. See Credits System. |
| Transcription takes a long time | Longer audio files take more time to process. Files over 10 minutes may take a few minutes to complete. |