Features - Subtitle Sphere

⚙️ Batch Processing

Batch Audio & Video Transcription

Process multiple audio and video files simultaneously with Subtitle Sphere's powerful batch transcription feature. Whether you have dozens of interview recordings, lecture videos, or podcast episodes, you can queue them all for transcription and let the software work through them automatically. Each file is processed with the same high-quality OpenAI Whisper technology, generating accurate transcripts with timestamps. Save time and streamline your workflow by transcribing entire folders of content in one go, with all results saved to your designated output directory.

Batch Translation

Translate multiple files at once using Subtitle Sphere's batch translation capabilities. Support includes SRT subtitle files, plain text documents, PDFs, and DOCX files. Choose from Google Translate (via Deep Translator), Argos Translator (offline), Google Gemini, or OpenAI GPT to process your entire collection of documents in a single operation. The system maintains file structure and formatting while delivering consistent, high-quality translations across all your content.

For Google Gemini and OpenAI GPT batch translations, users must provide their own API keys.

Batch Text-to-Speech with Kokoro TTS

Generate narration for multiple text files or SRT subtitle files using the offline Kokoro TTS engine. Perfect for creating audiobooks or narrated content in bulk, this feature processes entire folders of text documents, converting them into natural-sounding speech in multiple languages including English (US, GB), Italian, French, Japanese, Spanish, Hindi, Portuguese, and Chinese. For SRT files, the system synchronizes the audio perfectly with subtitle timestamps, making it ideal for creating voice-over tracks that match your video timing exactly.

Batch Text-to-Speech with Chatterbox TTS

Leverage the power of Chatterbox/Resemble AI for batch text-to-speech processing completely offline. Process multiple text files for audiobook creation or multiple SRT files with timestamp synchronization for video dubbing projects. The system maintains vocal consistency across all files while allowing you to adjust vocal exaggeration and pacing parameters. Whether you're creating a series of narrated documents or dubbing an entire video series, batch processing ensures efficiency without compromising on the natural, expressive quality that Chatterbox delivers.

Batch Text-to-Speech with F5 TTS

Leverage the power of F5TTS for batch text-to-speech processing completely offline. Process multiple text files for audiobook creation or multiple SRT files with timestamp synchronization for video dubbing projects. The system maintains vocal consistency across all files while allowing you to adjust vocal exaggeration and pacing parameters. Whether you're creating a series of narrated documents or dubbing an entire video series, batch processing ensures efficiency without compromising on the natural, expressive quality that F5TTS delivers.

Batch Voice Cloning

Clone multiple voice samples efficiently with Subtitle Sphere's batch voice cloning feature. Upload a collection of voice recordings, and the system will create personalized voice models for each sample, preserving unique speech patterns, accents, and vocal characteristics. This is particularly useful for content creators, localization teams, or accessibility projects that require multiple distinct voice profiles. All processing happens offline using the integrated Chatterbox/Resemble AI technology, ensuring your voice data remains private and secure.

Note: Appropriate consent and usage rights are required for all voice data. Users are responsible for ethical and lawful use.

Batch Audio/Voice Enhancement

Improve the quality of multiple audio files simultaneously with intelligent audio enhancement. This feature reduces background noise, normalizes volume levels, and enhances vocal clarity across your entire collection of recordings. Whether you're preparing podcast episodes, cleaning up interview recordings, or improving the quality of voice samples before cloning, batch audio enhancement ensures consistent, professional-grade results throughout all your files. The process preserves the natural characteristics of each voice while removing unwanted artifacts and improving overall intelligibility.

🎙️ Transcription & Subtitling Services

Video Transcription, Translation & Subtitling

Subtitle Sphere provides a comprehensive solution for transcribing, translating, and subtitling your videos. With support for various transcription formats, including original transcripts with timestamps, plain text, and a modified format that intelligently merges subtitle segments for better flow, Subtitle Sphere ensures precise transcriptions across multiple languages. The integration of OpenAI Whisper and Whisper-Google Fusion guarantees speed and accuracy in every stage, from transcription to translation.

Video Subtitling

Effortlessly create customized, high-quality subtitles for your videos with Subtitle Sphere. Adjust the appearance, timing, and position of subtitles to enhance viewer comprehension and match your video's style. Subtitle Sphere now offers enhanced subtitle flow by merging segments intelligently, taking into account punctuation and timing, ensuring a more natural viewing experience.

Transcribe Video & Generate SRT

Convert your video files into accurate SRT subtitle files using OpenAI Whisper and Whisper-Google Fusion technology. Whether you need the original timestamped transcript or a modified format for improved subtitle flow, Subtitle Sphere allows you to generate high-quality subtitles for better accessibility and viewer engagement.

Transcribe Audio & Generate SRT

Transcribe your audio files with precision and generate accurate SRT subtitles, perfect for podcasts, interviews, and speeches. Choose from various transcription formats, including the option to merge subtitle segments based on timing, punctuation, and customizable character limits.

Google Video Transcription & Translation

Utilize Google's advanced speech recognition technology to transcribe and translate your video content. Generate full-text transcripts and accurate translations in multiple languages, without timestamps or SRT files, making your videos accessible to a broader audience.

Google Audio Transcription & Translation

Convert your audio files into text and translate them into various languages using Google's speech recognition technology. This feature provides a complete script of your audio content, making it ideal for transcriptions and translations without the need for timestamps or SRT files.

Generate SRT from Plain Text File

Turn your written scripts or plain text files into fully functional SRT subtitles with ease. Subtitle Sphere streamlines the process of adding subtitles, allowing you to quickly convert text into a subtitle format.

Real-time Transcription, Translation & Summarization

Process live audio streams with simultaneous transcription, translation, and summarization capabilities. This powerful feature combines OpenAI Whisper's speech recognition with OpenAI GPT and Google Gemini's language processing to deliver real-time results. Perfect for live meetings, conferences, interviews, or streaming content. Users can receive instant transcripts, translations in multiple languages, and intelligent summaries of the spoken content as it happens. The system maintains accuracy even with background noise and supports various audio input sources.

Note: Users must provide their own OpenAI and Google Gemini API keys to access this service.

ePUB Translation & Compilation

Convert ePUB books into text, translate them using Argos (offline), Google (online), Gemini, or OpenAI (with your personal API key), and recompile into a proper ePUB format for ebook readers. Users must have copyright permission or use free/public domain books (e.g., Gutenberg). The resulting translated ePUB is not for sale, redistribution, or reverse engineering. Original authors and publishers retain all copyright.

Note: For OpenAI and Google Gemini translations, users must provide their own API keys.

Speaker Diarization

Identify and separate different speakers within your audio or video content with advanced Speaker Diarization technology. Subtitle Sphere combines OpenAI Whisper, WhisperX, and Pyannote to deliver precise speaker segmentation and labeling. This feature automatically detects when different speakers are talking, assigns them unique labels (e.g., Speaker 1, Speaker 2), and aligns their dialogue with timestamps for clear, organized transcripts. Ideal for meetings, interviews, podcasts, and panel discussions, this integration ensures both accuracy and natural flow in speaker differentiation.

Note: Users must provide their own Hugging Face token to download Pyannote models on their computers.

🌐 Translation Services

Translate SRT, TXT, PDF, and DOCX Files

Easily translate your SRT, TXT, PDF, and DOCX files into multiple languages using Google Translate through Deep Translator, Argos Translator, Google GEMINI, or OpenAI GPT. PDF and DOCX files will be automatically converted to TXT format before translation to ensure smooth processing.

For Google Gemini and OpenAI GPT translations, users must provide their own API keys to access these services.

ePUB Translation with Format Preservation

Translate ePUB ebooks while maintaining the original formatting, structure, and styling of the digital book. Unlike simple text conversion, this advanced feature preserves chapters, formatting tags, images, metadata, and layout elements throughout the translation process. Choose from Argos (offline), Google (online), Gemini, or OpenAI translation engines with your personal API key. The resulting ePUB file maintains professional ebook standards with proper chapter navigation, formatted text, and embedded elements intact. Perfect for publishers, translators, and readers who want to enjoy books in different languages without losing the original reading experience.

Note: Users must have copyright permission or use free/public domain books. For OpenAI and Google Gemini translations, users must provide their own API keys. The translated ePUB is not for sale, redistribution, or reverse engineering.

🗣️ AI-Powered Text-to-Speech

AI-Powered Narration (Powered by Google Gemini TTS)

Leverage Google Gemini TTS to generate natural-sounding voice narration with access to a wide range of expressive voices featuring emotions and nuanced intonations. Our software supports all the languages that Google Gemini TTS currently offers, enabling you to create rich, dynamic audio content. Personalize your narration by choosing from multiple voices and emotional styles to best match your content's tone. You can create multilingual, multi-speaker audio files by either entering custom text directly, importing your plain text files, or even asking Google Gemini to write the script for you within the software.

Note: Users must provide their own Google Gemini API key to access this service.

AI-Powered Narration (Powered by OpenAI GPT TTS)

Leverage OpenAI GPT TTS to generate natural-sounding voice narration with access to a wide range of expressive voices featuring emotions and nuanced intonations. Our software supports all the languages that OpenAI GPT TTS currently offers, enabling you to create rich, dynamic audio content. Personalize your narration by choosing from multiple voices and emotional styles to best match your content's tone. You can create multilingual, multi-speaker audio files by either entering custom text directly, importing your plain text files, or even asking OpenAI GPT to write the script for you within the software.

Note: Users must provide your own OpenAI GPT API key to access this service.

Real-time Chatbot Style Text to Speech Translation/TTS

Experience instant, conversational text-to-speech translation with our real-time chatbot interface. Powered by OpenAI GPT, this feature enables seamless communication across languages with natural-sounding voice output. Simply type your message, and receive immediate translation with high-quality speech synthesis. Perfect for live conversations, customer support, or interactive language learning. The chatbot interface provides a user-friendly experience with customizable voice settings and multiple language support.

Note: Users must provide their own OpenAI GPT API key to access this service.

Offline Narration (Powered by Kokoro TTS, Chatterbox TTS & F5 TTS)

Enjoy high‑quality voice narration with Kokoro Text‑To‑Speech, Chatterbox TTS & F5TTS, completely offline solution—no internet required. They support multiple languages , making it ideal for multilingual projects. You can generate audio from text, PDF, or EPUB files, making it perfect for creating professional‑sounding audiobooks and other narrated content with ease.

AI-Powered Narration (Powered by Microsoft Edge TTS)

Leverage Microsoft Edge TTS to generate natural-sounding voice narration in 76 languages with access to over 300 predefined voices. Further personalize your audio by adjusting pitch, speech rate, and volume to suit your content style. You can create multilingual, multi-speaker audio files by either entering custom text directly or importing your own SRT or plain text files. This powerful flexibility allows for dynamic, expressive narration tailored to your video's message.

Video AI-Powered Narration (Powered by Google Text-to-Speech)

Enhance your videos with professional-grade AI-generated voice narration. Select from a variety of languages and customize voice characteristics, such as speed, pitch, and volume, to tailor the narration to your content's tone. With voice syncing features, the AI narration can be perfectly aligned with your subtitles for a seamless experience.

Audio AI-Powered Narration (Powered by Google Text-to-Speech)

Generate high-quality voice narration for your audio content in multiple languages. Customize the speed, pitch, and volume of the AI-generated voice to create unique voice variations. The syncing option ensures that your audio narration matches the timing of your subtitles or video.

Voice Cloning & Text-to-Speech (TTS)

Subtitle Sphere offers advanced Voice Cloning and Text-to-Speech (TTS) capabilities powered by its fully offline, customized integration of Chatterbox/Resemble AI, F5TTS & QWEN3-TTS. This system enables users to both replicate real voices and synthesize lifelike speech from text while maintaining full control over data privacy and performance.

Voice Cloning allows the creation of personalized voice models that accurately reproduce specific speech patterns, accents, and vocal traits from audio samples. These cloned voices can be used to deliver natural, consistent narration, personalized content, or accessibility-friendly audio while preserving the unique qualities of the original speaker. The cloning engine supports multiple languages and fine-tuning parameters for optimal fidelity and realism.

Text-to-Speech (TTS) uses the same underlying voice technology to convert text into expressive, natural-sounding speech. Users can adjust vocal exaggeration for emotional emphasis and pacing for timing and delivery control. Additionally, the TTS module supports reference voice synthesis, allowing users to guide tone, rhythm, and style using a short sample recording—without requiring a full cloned model. This makes it ideal for dubbing, e-learning, localization, and voice branding applications.

Note: These features require appropriate consent and usage rights for any voice data used. Users are responsible for ensuring ethical and lawful use of cloned or synthesized voices.

📥 Content Import & Management

Import Videos from External URLs

You can now import videos directly from external URLs, such as YouTube, Dailymotion, or Vimeo. Simply provide the URL, and Subtitle Sphere will download the video for you. Please ensure the video is licensed appropriately, under Creative Commons, or complies with the source website's copyright policy.

Import Transcripts from External URLs

Subtitle Sphere allows you to import transcripts from YouTube in three different formats: plain text, SRT, and raw timestamps. This feature simplifies the process of working with external content, allowing you to quickly integrate and edit transcripts. Please ensure the video is licensed appropriately, under Creative Commons, or complies with the source website's copyright policy.

PDF to TXT and DOCX Converter

Effortlessly convert PDF files into editable TXT with high accuracy text extraction. This feature preserves the original formatting where possible. Perfect for making PDFs accessible for further editing, translation, or processing through other Subtitle Sphere features.

SRT/JSON/VTT Converter

Effortlessly convert subtitle files between SRT, JSON, and VTT formats while preserving timing, structure, and metadata with high accuracy. This feature keeps cues synchronized and formatting intact wherever possible, making it perfect for cross‑platform subtitle use, editing workflows, or integration with other Subtitle Sphere tools. It delivers fast, reliable conversions without compromising quality.

🎬 Audio & Video Editing

Audio & Video Segment Extraction

Precisely extract specific segments from your audio and video files with frame-accurate control. Select multiple segments from a single file using intuitive time markers, then choose to save each segment separately or combine selected portions into a new file. This powerful feature is perfect for creating highlight reels, removing unwanted sections, or compiling the best moments from longer recordings. The extraction process maintains the original quality of your media while giving you complete control over what to keep and what to discard.

Speaker-Aware Audio Segmentation

Automatically identify and separate different speakers within your audio files using advanced speaker diarization technology. Once speakers are detected and labeled, you can extract and save audio segments for each individual speaker separately. This is invaluable for interviews, panel discussions, podcasts, or meetings where you need to isolate specific participants' contributions. Each speaker's audio is saved as a separate file, making it easy to edit individual contributions, create speaker-specific content, or analyze conversation patterns. The system maintains timing accuracy and can handle overlapping speech with intelligent separation algorithms.

Extracting & Removing Audio from Video Files

Easily extract or remove audio tracks from your video files. Whether you need a clean audio-free version or want to isolate the sound for further editing, Subtitle Sphere provides the flexibility to meet your needs.

Audio & Video Merger

Combine audio with video using Subtitle Sphere's Audio & Video Merger. You can adjust the speed of either the audio or video to match the duration of the other, and choose to keep the original audio or mute it while adding the new audio.

Vocal Remover

With the Vocal Remover feature, you can separate vocals from music in an audio file, making it easier to isolate the background music or create karaoke tracks.

Video Converter

The Video Converter in Subtitle Sphere fixes common video distortion issues—particularly helpful for Windows users. It resolves a long-standing problem where portrait videos (9:16), often recorded on iPhones or created with editing apps, are saved as landscape with hidden rotation metadata. This tool re-saves them in true portrait orientation, ensuring they display correctly across all platforms.

In addition, the converter offers flexible resolution control. You can downscale videos to save space (e.g., 480p) or upscale to higher resolutions like 1080p or even 4K for better quality—perfect for editing, sharing, or archiving.

Since many Windows tools ignore rotation metadata, this converter ensures your videos appear exactly as intended, eliminating the need for manual fixes or workarounds.

🤖 Text Processing & AI Features

Text Summarization

Generate concise, intelligent summaries of your text content using advanced AI models from Google Gemini and OpenAI GPT. Perfect for condensing lengthy documents, transcripts, or articles into key points and essential information. Users can choose between different summarization styles to meet their specific needs. This feature supports multiple languages and maintains the original context while delivering clear, coherent summaries.

Note: Users must provide their own Google Gemini or OpenAI GPT API keys to access this service.

⚡ Enhanced Features & Improvements

Enhanced Transcription Options

Subtitle Sphere offers advanced transcription features, including new formats for your transcripts, such as original with timestamps, plain text without timestamps, and a modified format that intelligently merges subtitle segments for a smoother flow. You can further customize subtitle line concatenation by adjusting the maximum characters per line, maximum duration, and the gap between lines. Additionally, Whisper Turbo and Whisper Large Turbo transcription models offer improved accuracy and faster results. Original transcripts and plain text formats are saved for your reference in a folder of your choosing.

Line-by-Line or Smart Chunked API Processing

Subtitle Sphere uses an intelligent content submission system when working with speech recognition, translation, and text-to-speech APIs. Instead of transmitting entire files, the platform sends only the relevant text or audio content — either line-by-line or in optimized chunks — based on token count for text and duration thresholds for audio. This ensures fast, secure, and compliant API communication across all supported services.

No Raw File Transfers: Audio and text files are never directly uploaded to third-party services. Only the processed content — such as text snippets or audio slices — is submitted.
Token-Aware Text Batching: For tasks like translation or language modeling, Subtitle Sphere intelligently batches text into chunks that fit within specific token limits (e.g., for OpenAI or similar models). If the content is short enough, it's sent as a whole; otherwise, it's split to maximize efficiency while staying within limits.
Duration-Aware Audio Slicing: For audio transcription or synthesis, the system slices audio based on predefined duration thresholds (e.g., 10–30 seconds), ensuring smooth processing without overloading speech APIs.
Line-by-Line Flexibility: When precision is critical, Subtitle Sphere can process one line at a time — ideal for accurate error handling, stepwise translation, or controlled speech generation.
Reduces API Overload: Smart chunking prevents API rate-limit issues by keeping request sizes manageable and consistent across different service providers.
Improves Responsiveness: Smaller, faster-to-process inputs enable more immediate results, particularly useful for near real-time applications.
Strengthens Privacy & Security: By only sending extracted content — never the original files — the risk of exposing sensitive or complete data is greatly reduced.
Built-in Rate Compliance: Automatic delays (typically 1–3 seconds) between requests ensure smooth integration with third-party API usage policies.
Targeted Error Recovery: Failures only affect the specific chunk or line being processed, enabling quick retries and preventing total workflow disruptions.

This hybrid content-aware system enables Subtitle Sphere to deliver efficient, scalable, and privacy-conscious API integration — whether processing entire segments, single lines, or intelligently batched chunks.

📞 User Support & Feedback

Feedback & Subscription

Users can now easily provide feedback and subscribe to our YouTube channel directly within the app. Stay updated on the latest features and tips by following our content and sharing your thoughts with us.

Help & Support

For added convenience, Subtitle Sphere now includes help buttons and informational sections throughout the platform, providing users with additional guidance and explanations for every feature.

Explore the Features

⚙️ Batch Processing

Batch Audio & Video Transcription

Batch Translation

Batch Text-to-Speech with Kokoro TTS

Batch Text-to-Speech with Chatterbox TTS

Batch Text-to-Speech with F5 TTS

Batch Voice Cloning

Batch Audio/Voice Enhancement

🎙️ Transcription & Subtitling Services

Video Transcription, Translation & Subtitling

Video Subtitling

Transcribe Video & Generate SRT

Transcribe Audio & Generate SRT

Google Video Transcription & Translation

Google Audio Transcription & Translation

Generate SRT from Plain Text File

Real-time Transcription, Translation & Summarization

ePUB Translation & Compilation

Speaker Diarization

🌐 Translation Services

Translate SRT, TXT, PDF, and DOCX Files

ePUB Translation with Format Preservation

🗣️ AI-Powered Text-to-Speech

AI-Powered Narration (Powered by Google Gemini TTS)

AI-Powered Narration (Powered by OpenAI GPT TTS)

Real-time Chatbot Style Text to Speech Translation/TTS

Offline Narration (Powered by Kokoro TTS, Chatterbox TTS & F5 TTS)

AI-Powered Narration (Powered by Microsoft Edge TTS)

Video AI-Powered Narration (Powered by Google Text-to-Speech)

Audio AI-Powered Narration (Powered by Google Text-to-Speech)

Voice Cloning & Text-to-Speech (TTS)

📥 Content Import & Management

Import Videos from External URLs

Import Transcripts from External URLs

PDF to TXT and DOCX Converter

SRT/JSON/VTT Converter

🎬 Audio & Video Editing

Audio & Video Segment Extraction

Speaker-Aware Audio Segmentation

Extracting & Removing Audio from Video Files

Audio & Video Merger

Vocal Remover

Video Converter

🤖 Text Processing & AI Features

Text Summarization

⚡ Enhanced Features & Improvements

Enhanced Transcription Options

Line-by-Line or Smart Chunked API Processing

📞 User Support & Feedback

Feedback & Subscription

Help & Support