Frequently Asked Questions

1. What is Subtitle Sphere?

Subtitle Sphere is a powerful AI transcription, translation, subtitling, and narration tool designed for desktop use. It supports over 128 languages for translation and offers AI-powered transcription for video and audio files in 90 languages. Users can generate SRT files from text, create subtitles, and utilize AI-powered voice narration in multiple languages. As a standalone application, Subtitle Sphere works offline and requires no subscription, ensuring user privacy while providing unlimited transcription, translation, subtitling, and narration capabilities without relying on cloud services. However, since most of the processing (transcription and subtitling) happens offline on your computer, it can take up system memory, and processing times may be slower depending on your device’s specifications. This is a trade-off between privacy and speed.

2. Does it require an internet connection?

No, Subtitle Sphere is a standalone application that works offline once downloaded. However, an internet connection is required for the translation, SRT generation from plain text, and text-to-speech features.

3. How do I download the software?

To download the software, go to the download page, accept the End User Agreement, and you will be provided with a link.

4. What languages are supported?

Subtitle Sphere supports 128 languages for translation, 90 languages for transcription, and 63 languages for AI-powered voice generation. For a complete list of supported languages, please check our Languages page.

5. Is there a subscription fee?

No, Subtitle Sphere is completely free and does not require any subscription fees. If you are charged for using our service or come across other websites selling it, please report such incidents through our contact page, as they are illegal.

6. How do I install Subtitle Sphere?

After downloading, visit our instructions page to complete the installation process.

7. Can I use Subtitle Sphere on multiple devices?

Yes, you can install Subtitle Sphere on multiple devices as long as each device meets the system requirements.

8. What if I encounter issues during installation?

If you encounter any issues during installation, please visit our contact page. You can reach us by email, send us a direct message on Instagram, or leave a comment on our YouTube page.

9. How can I provide feedback or report a bug?

We welcome your feedback! Please use our contact page to send us your comments or report any bugs you encounter.

10. Is my data stored anywhere?

No, Subtitle Sphere does not collect or store any user data. The software is standalone and primarily operates offline on your computer, ensuring your privacy. However, for translation and AI narration features, it can utilize services including Google Translate, Google Text-to-Speech, Microsoft Edge, OpenAI.fm, Gemini, and OpenAI GPT. These features require an internet connection and transmit relevant text data directly to the respective services at the time of use.

If users choose to use OpenAI GPT or Gemini, they must provide their own API keys. These keys are not stored in the software and are only transmitted to the corresponding services during active use. Subtitle Sphere does not retain or transmit any data beyond the direct use of these services. Users are strongly encouraged to consult the data collection, billing, and privacy policies of these third-party providers. For more details, please refer to the End User License Agreement (EULA).

Because most of the processing is handled locally on your machine, memory usage may increase and performance may vary depending on your system’s capabilities—this is a trade-off made to prioritize user privacy.

11. Are there any limitations on usage?

Subtitle Sphere does not impose any limitations on the number of transcriptions, translations, or narrations you can perform. There are also no restrictions on data usage for transcription, allowing you to work freely and offline.

However, for translation and AI narration features, the software can use services such as Google Translate, Google Text-to-Speech (gTTS), Microsoft Edge Text-to-Speech, OpenAI.fm, Gemini, and OpenAI GPT. These services require an internet connection and may have their own usage limitations or quotas. Users are responsible for reviewing the data usage, rate limits, and billing terms set by these third-party services.

When using OpenAI GPT or Gemini, users must provide their own API keys. These keys are not stored by Subtitle Sphere and are only used at the time of service access. Subtitle Sphere is not responsible for any data usage or charges incurred through these services. For detailed terms and responsibilities, please refer to the End User License Agreement (EULA) and the respective third-party service policies.

12. How do I update the software?

You can stay informed about updates by following our social media channels. Keep an eye out for the latest improvements and features!

13. What operating systems are supported?

Subtitle Sphere is compatible with macOS, and Windows operating systems.

14. Can I customize subtitle styles?

Yes, Subtitle Sphere allows you to customize the styles, colors, and fonts of your subtitles to match your preferences. However, the font type can only be customized for languages that use the English alphabet. For languages with special characters, a specific font is required to ensure proper display of those characters.

15. Where can I find additional resources or tutorials?

Visit our Instructions page for guides, videos, and other helpful resources to maximize your use of Subtitle Sphere. Follow us on Instagram and YouTube for the latest updates and tips! Links to our social channels can be found on our contact page.

16. Which models are used in this software?

Subtitle Sphere offers an intuitive interface built on top of powerful open-source Python libraries and third-party APIs. For transcription, it uses the OpenAI Whisper module (an open-source Python library), enhanced by a proprietary Whisper-Google Fusion model that combines Whisper with Google’s Speech Recognition via the SpeechRecognition Python library. This approach provides improved accuracy and performance, while keeping core functionality offline where possible.

Translation is handled through Google Translate using the Deep Translator Python module. Additionally, users can opt to use Gemini and OpenAI GPT for translation by providing their own API keys, which are not stored and are only used at the time of translation.

For text-to-speech (voice narration and SRT generation from text), the software supports multiple options: Google Text-to-Speech (gTTS via the gTTS Python library), Microsoft Edge Text-to-Speech (via the edge-tts Python library), OpenAI.fm (through community-developed GitHub integrations), as well as Gemini TTS and OpenAI GPT TTS for users supplying their own API keys.

For vocal isolation (e.g., separating dialogue from background audio), Subtitle Sphere uses the Demucs Python library.

All these tools work seamlessly in the background—there’s no need to install or interact with Python directly.

17. What types of files are supported?

Subtitle Sphere supports various file types. For inputs, it accepts video formats like .mov and .mp4, audio formats like .wav, as well as text and .srt files. The outputs can be either .mp4 or .srt files, depending on your needs.

18. How accurate are the transcription and translation services?

The accuracy of transcription depends on the quality of the audio, the language being transcribed, and the chosen Whisper model (tiny, base, small, medium, or large). The large model offers the highest accuracy, while the tiny model has the lowest but still works well for languages like English and French, which have extensive training data. However, larger models are more computationally intensive, leading to a higher load on your computer. Translation accuracy also depends on the chosen language, with commonly spoken languages generally having higher precision.

19. How long does translation, transcription, and subtitling take?

The time required for translation, transcription, and subtitling depends on several factors, including the length of the video or audio, the file size and quality, and the computational power of your device. More powerful systems will process tasks faster. Selecting larger Whisper models for transcription (like medium or large) will increase both accuracy and processing time. Since the majority of the processing happens offline on your machine, this can affect your system’s memory usage and slow down the process, especially with larger files. Translation speeds are also influenced by internet connection quality.

20. Are there any limits to the file types or sizes?

There is no strict limit on file size, but larger files will take longer to process. Currently, Subtitle Sphere supports the following file types for input: .mov, .mp4, mp3, .wav, .txt, and .srt. If your file is larger or in a different format, it may require more time or conversion to one of the supported types. We are working to expand support for additional formats in future updates.

21. Does Subtitle Sphere support multilingual content in the same file?

Yes, Subtitle Sphere supports multilingual transcription with Whisper. By selecting "Auto" for the source language, the model can detect and transcribe multiple languages in the same file. For best results, use the medium or large model. Note that Whisper may prioritize one language if there's a dominant accent, even when multiple languages are spoken. You can also translate the transcription into any target language after processing.

22. What is Whisper, and how does it work in Subtitle Sphere?

Whisper is an AI-powered speech recognition system developed by OpenAI, used in Subtitle Sphere for transcription and subtitling. It operates offline, ensuring privacy by processing audio directly on your computer. Whisper offers five models—Tiny, Base, Small, Medium, and Large—which vary in speed and accuracy. For clear audio in languages like English and French, the smaller Tiny and Base models are sufficient and faster. For multilingual content or complex audio, the Medium and Large models offer better accuracy but take longer to process. Users should note that larger models reduce the risk of "hallucinations" (incorrect transcriptions), especially with less common languages or difficult audio. You can find more information about Whisper on OpenAI's GitHub page.

23. Why am I getting this error: "invalid literal for int() with base 10: ‘’"?

If you're seeing the error message, "invalid literal for int() with base 10: ‘’", it usually means there is an extra line either at the end of your SRT file or between each subtitle segments. An SRT file typically consists of a subtitle ID, a time stamp, and the subtitle text. Each subtitle block should be followed by a blank line, including the last one in the file. However, sometimes there may be additional blank lines or spaces. To resolve the issue, ensure there is exactly one blank line after the last subtitle or between each subtitle segment, and no more. If the issue persists, please don't hesitate to contact us.

24. Why was the transcribing process interrupted and returned to the main menu when using the Whisper-Google Fusion option?

If the transcribing process was interrupted and the main menu appeared, it likely means there was either a disruption in your internet connection or you have reached Google’s Speech Recognition usage limit. Please check your internet connection and try again in a few minutes. Since you are using a third-party service (Google Speech Recognition), it is recommended to review their quota limits and ensure you stay within their usage guidelines.

25. What is the Whisper-Google Fusion transcribing option?

The Whisper-Google Fusion is a proprietary transcription method developed by Tandis 24 Design Lab. Without going into too much detail, it combines the strengths of OpenAI’s Whisper transcription with Google Speech Recognition. This approach is particularly useful for less common languages, where smaller Whisper models may struggle with accuracy, and larger models can be computationally demanding. The Whisper-Google Fusion model offers a faster and more accurate transcription by balancing these two technologies. The slider bar allows you to adjust the accuracy of the transcription—after a few attempts, you'll get the hang of it and be able to fine-tune it for better results. We apologize for not providing a more detailed explanation, as this is a proprietary method.

26. Why with Google video/audio transcription option, I only get a long text and no time stamps?

The Google video/audio transcription option utilized in Subtitle Sphere is based on the free version of Google Speech Recognition. While this tool excels at producing coherent and high-quality text transcriptions, it unfortunately does not provide timestamps. If your primary need is accurate text without timestamps, Google Speech Recognition is a great choice and can serve as a reliable benchmark for your content. However, for users requiring timestamped transcriptions, we recommend leveraging Subtitle Sphere's features with the Whisper or Whisper-Google Fusion models.

27. Why is there a multi-step download process?

Although our software is free, we want it to be accessed by those who genuinely need and appreciate it. We also want users to take the time to understand its features and capabilities before downloading. Unfortunately, when we previously offered a direct download link, we experienced hacking attempts, spam, scams, and overall disrespect toward our work.

Subtitle Sphere leverages powerful technologies from Google and OpenAI, offering 11 advanced features with no subscriptions, no usage limits, and no restrictions on file size or duration. Comparable software can be expensive, yet we provide ours as a free passion project. We are not backed by a large company or external funding—this is something we offer because we believe in its value.

In return, all we ask for is respect. Please follow the download process as outlined, and we appreciate your understanding and support.

28. How many voices are available for each text-to-speech feature?

Subtitle Sphere supports multiple text-to-speech (TTS) engines, each offering a range of voice options to suit different languages, styles, and creative needs.

If you use the Google Text-to-Speech feature, each language traditionally included only one voice—either male or female—depending on availability. This limitation was based on our use of the free version of Google TTS to ensure accessibility for all users. Some languages may only offer a single voice option.

Starting with Subtitle Sphere version 4.0.0, a voice enhancement feature was introduced. This allows users to adjust the pitch, volume, and speed of the AI-generated voice, and even create custom voice effects. A female voice, for example, can be modified into a male voice, robotic tone, chipmunk, or child-like voice—unlocking greater flexibility for narration and dubbing.

As of version 5.0.0, Subtitle Sphere added support for Microsoft Edge Text-to-Speech, which offers over 300 voices across 76 languages. Each language typically includes at least one male and one female voice, providing a wide range of natural-sounding options for different contexts and audiences.

With the release of version 6.0.0, the software now also supports Gemini TTS and OpenAI GPT TTS for users who provide their own API keys. These advanced services offer even greater language support, voice diversity, and the ability to express different emotions and speaking styles—such as happy, sad, angry, or professional tones—making them ideal for more expressive or narrative-driven projects.

With these combined options, Subtitle Sphere provides one of the most flexible and powerful TTS toolsets available for video creators, educators, and content developers.

29. Is there a way to change the speed of the audio?

Yes. As of Subtitle Sphere version 4.0.0, the new voice enhancement feature allows you to change the speed, pitch, and volume of the AI-generated voice directly within the application. This eliminates the need for external editing tools and gives you full control over how the voice sounds.

In addition to speed adjustments, you can also use this feature to creatively modify the voice—for instance, converting it into a different gender or creating unique effects like robotic, cartoonish, or child-like voices.

30. What does the “No text to send to TTS API” error mean when using the Generating SRT feature?

You are encountering this error while using the Generating SRT feature due to excessive spaces between subtitle sections. When starting a new line, make sure to use only one or at most two line breaks.

31. Why am I getting the error “HTTPSConnectionPool (host='www.google.com', port=443): Read timed out. (read timeout=5)” while running the Generating SRT feature?

This error occurs due to a loss of internet connection during the process. Please check your internet connection and try again.

32. I am not getting any of the errors mentioned above, but I am still unable to generate SRT from plain text. What should I do?

Although Subtitle Sphere has no built-in limit, it relies on Google’s services to generate SRT. You may encounter a Google request limit if you send too many requests back-to-back or submit a long request. If this happens, check your terminal for messages such as "Google request failed" or "Exceeded your limit." No worries—you can retry after a few minutes. Additionally, if you are running the software on multiple computers connected to the same router, they share the same IP address. Sending simultaneous requests from multiple devices may cause you to hit the limit faster and get disconnected.

33. Why am I getting the error “langdetect.lang_detect_exception.LangDetectException: No features in text.” when using the SRT Translate feature?

This error likely occurs because one or more subtitle lines contain only punctuation or non-letter characters instead of meaningful text. Please ensure that each line contains proper text or is left completely empty.

34. What does the error “UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 34: invalid start byte” mean when generating SRT using the Generate SRT feature?

This error occurs because the plain text input is not properly formatted. To prevent this issue, use Notepad on Windows or TextEdit (in plain text mode) on macOS to format your text correctly before processing it.