Skip to main content

AI Transcription

Soku includes built-in AI transcription powered by OpenAI Whisper. You can transcribe the audio from any video or audio file in your Media Library and use the resulting text for captions, subtitles, repurposing into text posts, or any other purpose.

How It Works

  1. Upload a video to your Media Library (or select an existing one).
  2. Open the asset and click Transcribe.
  3. Select the language of the audio (optional but recommended for accuracy).
  4. Soku sends the audio to OpenAI Whisper for processing.
  5. The transcript text is returned and saved to the asset.
The transcription typically completes within a few seconds to a couple of minutes depending on the length of the audio.

Credit Cost

Each transcription costs 1 credit, regardless of the length of the audio. This is a flat rate — a 30-second clip costs the same as a 30-minute video.
Credits are included with your subscription plan and renew monthly. See Credits System for details on how credits work.

Language Selection

When starting a transcription, you can specify the language of the audio using an ISO 639-1 language code (for example, en for English, es for Spanish, ja for Japanese). Specifying the correct language improves transcription accuracy, especially for non-English content. If you do not specify a language, Whisper will attempt to auto-detect it.
Language CodeLanguage
enEnglish
esSpanish
frFrench
deGerman
ptPortuguese
jaJapanese
koKorean
zhChinese
These are common examples. Whisper supports a wide range of languages — use any valid ISO 639-1 code.

What You Get Back

A completed transcription returns:
FieldDescription
Transcript textThe full text transcription of the audio.
Likely lyrics detectionA likelyLyrics flag indicating whether the audio appears to contain song lyrics rather than speech.
The likelyLyrics detection helps you decide how to use the transcript. For example, if the audio is a music track, you may want to use the transcript differently than if it were spoken narration.

Using Transcripts

Once a transcription is complete, you can:
  • Generate AI captions — Use the transcript as input for Soku’s AI caption generator to create platform-ready captions.
  • Copy the text — Copy the transcript to your clipboard for use anywhere.
  • Repurpose content — Turn video content into text posts, blog excerpts, or newsletter content.

Idempotency

If you are using the API to trigger transcriptions programmatically, you can include an Idempotency-Key header with your request. This ensures that if the same request is sent more than once (for example, due to a network retry), the transcription is only performed and charged once.
Idempotency keys are recommended when integrating transcription into automated workflows to prevent duplicate charges. See the Transcription API documentation for details.

Troubleshooting

ProblemSolution
Transcription returns empty textThe audio track may be silent or too quiet. Check that your video has audible audio.
Transcription is inaccurateTry specifying the correct language code. Auto-detection works well for common languages but may struggle with less common ones or heavy accents.
”Insufficient credits” errorYou have run out of credits. Check your remaining balance in Settings or upgrade your plan. See Credits System.
Transcription takes a long timeLonger audio files take more time to process. Files over 10 minutes may take a few minutes to complete.