AI Transcription

Soku includes built-in AI transcription powered by OpenAI Whisper. You can transcribe the audio from any video or audio file in your Media Library and use the resulting text for captions, subtitles, repurposing into text posts, or any other purpose.

How It Works

Upload a video to your Media Library (or select an existing one).
Open the asset and click Transcribe.
Select the language of the audio (optional but recommended for accuracy).
Soku sends the audio to OpenAI Whisper for processing.
The transcript text is returned and saved to the asset.

The transcription typically completes within a few seconds to a couple of minutes depending on the length of the audio.

Credit Cost

Each transcription costs 1 credit, regardless of the length of the audio. This is a flat rate — a 30-second clip costs the same as a 30-minute video.

Credits are included with your subscription plan and renew monthly. See Credits System for details on how credits work.

Language Selection

When starting a transcription, you can specify the language of the audio using an ISO 639-1 language code (for example, en for English, es for Spanish, ja for Japanese). Specifying the correct language improves transcription accuracy, especially for non-English content. If you do not specify a language, Whisper will attempt to auto-detect it.

Language Code	Language
`en`	English
`es`	Spanish
`fr`	French
`de`	German
`pt`	Portuguese
`ja`	Japanese
`ko`	Korean
`zh`	Chinese

These are common examples. Whisper supports a wide range of languages — use any valid ISO 639-1 code.

What You Get Back

A completed transcription returns:

Field	Description
Transcript text	The full text transcription of the audio.
Likely lyrics detection	A `likelyLyrics` flag indicating whether the audio appears to contain song lyrics rather than speech.

The likelyLyrics detection helps you decide how to use the transcript. For example, if the audio is a music track, you may want to use the transcript differently than if it were spoken narration.

Using Transcripts

Once a transcription is complete, you can:

Generate AI captions — Use the transcript as input for Soku’s AI caption generator to create platform-ready captions.
Copy the text — Copy the transcript to your clipboard for use anywhere.
Repurpose content — Turn video content into text posts, blog excerpts, or newsletter content.

Idempotency

If you are using the API to trigger transcriptions programmatically, you can include an Idempotency-Key header with your request. This ensures that if the same request is sent more than once (for example, due to a network retry), the transcription is only performed and charged once.

Idempotency keys are recommended when integrating transcription into automated workflows to prevent duplicate charges. See the Transcription API documentation for details.

Troubleshooting

Problem	Solution
Transcription returns empty text	The audio track may be silent or too quiet. Check that your video has audible audio.
Transcription is inaccurate	Try specifying the correct language code. Auto-detection works well for common languages but may struggle with less common ones or heavy accents.
”Insufficient credits” error	You have run out of credits. Check your remaining balance in Settings or upgrade your plan. See Credits System.
Transcription takes a long time	Longer audio files take more time to process. Files over 10 minutes may take a few minutes to complete.

​AI Transcription

​How It Works

​Credit Cost

​Language Selection

​What You Get Back

​Using Transcripts

​Idempotency

​Troubleshooting

​Related Pages

AI Transcription

How It Works

Credit Cost

Language Selection

What You Get Back

Using Transcripts

Idempotency

Troubleshooting

Related Pages