BABYLPHYSH TERMINAL
Professional Subtitle Production Management System
Copyright © 2025 Douglas Beechwood
System Overview
Status: Production Ready
Release Date: September 23, 2025
Architecture: Modular Python Package
BabylPhysh Terminal is a comprehensive subtitle production management system designed for professional documentary film workflows. It provides quality control, multi-language translation, batch processing, and intelligent file management capabilities.
Key Capabilities
- Complete subtitle quality control and validation
- Multi-language translation with 108+ language support
- Intelligent language detection and file naming
- Professional HTML/CSV reporting system
- Frame rate detection and alignment
- Batch processing and pipeline automation
- Production analytics and insights
Terminal Interface
Main Menu Display
╔══════════════════════════════════════════════════════════════════╗ | |
║ 🎬 BABYLPHYSH TERMINAL v0.8a3-Enterprise ║ | |
║ Professional Subtitle Production System ║ | |
╚══════════════════════════════════════════════════════════════════╝ | |
CORE PROCESSING MODES: | |
---------------------------------------------------------------------- | |
1) QC-A | - Quality Check Analysis (No Changes) |
2) QC-FX | - Quality Check & Fix |
3) QC-DIFF | - Compare Original vs Processed |
4) TRANS | - Translation |
5) BT-QC | - Back-Translation Quality Check |
6) CONFORM | - Language Conforming |
7) SYNC | - Transcript Sync |
8) SRT-NAME | - Batch Append Language Names |
ENHANCED WORKFLOW MODES: | |
---------------------------------------------------------------------- | |
9) BATCH-QC | - Batch Quality Control |
P) PIPELINE | - Complete Production Workflow |
A) ANALYSIS | - Enhanced Analysis Dashboard |
---------------------------------------------------------------------- | |
0) Exit | |
====================================================================== | |
Enter number/letter or mode name (e.g., '1', 'P', or 'qc-a') | |
🎬 Select mode: _ |
Example User Session
$ python3 launch.py | |
Welcome to BabylPhysh Terminal Subtitle Toolkit! | |
Working directory: /Applications/BabylPhysh Terminal v0.8a3 | |
[Menu displays as shown above] | |
🎬 Select mode: 4 | |
=== TRANS: Translation === | |
Available files: | |
1) [masters ] | documentary_episode1.en.sdh.srt |
2) [masters ] | documentary_episode2.en.sdh.srt |
3) [to_qc ] | movie_subtitle.en.srt |
Choose file: 1 | |
Selected: documentary_episode1.en.sdh.srt (245 subtitles) | |
🌍 Target Languages: | |
1) | Essential Pack (5 languages) |
2) | Extended Pack (10 languages) |
3) | Comprehensive Pack (20 languages) |
4) | Custom selection |
Select package [1]: 1 | |
Processing: Spanish (Latin America)... ✓ | |
Processing: French... ✓ | |
Processing: German... ✓ | |
Processing: Portuguese (Brazil)... ✓ | |
Processing: Italian... ✓ | |
[SUCCESS] Translation Complete! | |
[OUTPUT] 5 language versions created in to_qc/ | |
[NEXT] Use CONFORM mode to check for English remnants |
Core Features - v0.8a3-Enterprise
- CPL/CPS/Duration checks
- Reading speed analysis
- HTML/CSV reports
- No file modifications
- All QC-A analysis
- Automatic corrections
- Frame grid alignment
- Before/after reports
- Visual diff highlighting
- Timing analysis
- Change tracking
- Color-coded output
- Single to multiple languages
- Language packages
- Progress tracking
- API integration
- Back-translation analysis
- Similarity scoring
- Quality assessment
- Statistical reporting
- English remnant detection
- Mixed script fixing
- Sound effect translation
- Consistency checking
- Text-to-timing alignment
- Fuzzy matching
- Auto distribution
- Confidence scoring
- Auto language detection
- Filename + content analysis
- Batch renaming
- 108+ language support
- Multi-file selection
- Batch analysis/fix
- Progress tracking
- Summary reports
- SDH to multi-language
- Stage automation
- Master reports
- Production ready
- File statistics
- Quality scoring
- Workflow insights
- Comparative analysis
Installation & Setup
Quick Start
API Configuration
Edit data/api_config.csv
:
Complete Version History
Major Enhancements
- Fixed all syntax errors and import dependencies
- Enhanced SRT-NAME with advanced language detection
- Improved workflow management and file selection
- Professional package structure with proper initialization
- Comprehensive error handling and logging
- Production-ready deployment system
Critical Fixes
- config.py syntax error (line 269) - RESOLVED
- srt_name.py import issues - RESOLVED
- __init__.py formatting errors - RESOLVED
- workflow module dependencies - RESOLVED
New Features
- Self-contained file selection in SRT-NAME
- Filename + content language detection
- Confidence scoring for detection accuracy
- Batch language naming with selective processing
- Professional rename preview and confirmation
Focus: Core Stability & Reporting
- Enhanced HTML/CSV reporting system
- Improved QC analysis algorithms
- Frame rate detection capabilities
- Basic batch processing implementation
- Professional report templates
Focus: Foundation & Basic Functionality
- Initial release of v0.8 series
- Core QC functionality established
- Basic translation support
- SRT parsing and writing engines
- Command-line interface framework
Focus: Prototype & Early Development
- Proof of concept implementation
- Basic mode structure design
- Initial CLI interface
- Early translation experiments
- Core architecture planning
Forward Looking: v0.9 "Intelligence Enhanced"
Development Timeline: 8 weeks
Status: Planning Phase
Strategic Vision
Transform BabylPhysh from a fixed-API tool into a flexible AI-powered subtitle production platform where users select optimal AI models for different tasks. The flagship feature addresses critical ASR transcription errors through an innovative Hallucination Removal Engine.
Priority Features
CRITICAL: Hallucination Removal Engine
Problem: Whisper creates fictional dialogue during unclear audio, foreign speech, or background noise
Solution:
- Manual marker detection [Hallucination begin/end]
- User-defined replacement text
- Batch correction across languages
- Pattern analysis and detection
- Timing preservation
Use Case: Documentary producer identifies 8 minutes of Tibetan speech that Whisper hallucinated as English dialogue. Marks sections, replaces with "[Tibetan speech]" across all 12 language versions in batch.
Multi-AI Model Selection
Choose Your Intelligence:
- GPT-3.5 Turbo (fast, economical)
- GPT-4 (high quality)
- GPT-4 Turbo (balanced)
- GPT-4o (latest model)
- Real-time cost estimation
- Quality vs cost optimization
Multi-Provider Support
Flexible API Integration:
- OpenAI (current, enhanced)
- Claude/Anthropic
- Google Gemini
- Azure OpenAI
- Automatic failover
- Cost comparison
Translation Memory
Consistency & Efficiency:
- Remember previous translations
- Terminology database
- Context preservation
- Auto-apply approved terms
- Export/import memories
AI-Powered QC Enhancements
Intelligent Analysis:
- Smart profile suggestions
- Context-aware error detection
- Cultural/linguistic checking
- Hallucination pattern hints
- Confidence scoring integration
Smart Batch Processing
Automated Optimization:
- Automatic model selection
- Load balancing across providers
- Cost optimization strategies
- Quality vs speed balancing
- Batch hallucination correction
Development Timeline
Phase | Duration | Focus | Deliverables |
---|---|---|---|
Phase 1 | Weeks 1-2 | Foundation | Hallucination removal engine, Model selection framework, Enhanced configuration |
Phase 2 | Weeks 3-4 | Provider Expansion | Claude/Gemini integration, Provider failover, Cost estimation engine |
Phase 3 | Weeks 5-6 | Intelligence Features | Translation memory, AI-powered QC, Context-aware processing |
Phase 4 | Weeks 7-8 | Advanced Features | Smart batch optimization, Enhanced analytics, Production reporting |
Expected Benefits
- 30% cost reduction through smart model selection
- Real-time cost estimation
- Budget-aware processing
- 20% translation quality increase
- 95% hallucination removal accuracy
- AI-powered error detection
- 40% workflow time reduction
- Translation memory efficiency
- Automated batch processing
Technical Specifications
System Requirements
Component | Minimum | Recommended |
---|---|---|
Python Version | 3.8 | 3.10+ |
RAM | 2GB | 4GB+ |
Disk Space | 500MB | 1GB+ |
Internet | Required for API | Broadband |
Supported File Formats
- Input: SRT (SubRip Text) files
- Output: SRT, HTML reports, CSV data
- Encoding: UTF-8, Latin-1 (auto-detected)
API Integration
Provider | Status (v0.8a3) | Status (v0.9) |
---|---|---|
OpenAI | Active | Enhanced |
Claude/Anthropic | — | Planned |
Google Gemini | — | Planned |
Azure OpenAI | — | Planned |
Supported Languages
BabylPhysh supports 108+ languages via BCP47 language codes, including:
- English (US, UK)
- Spanish (Latin America, Spain)
- French
- German
- Portuguese (Brazil, Portugal)
- Japanese
- Korean
- Chinese (Simplified, Traditional)
- Hindi
- Thai, Vietnamese
- Italian, Dutch, Swedish
- Polish, Czech, Hungarian
- Norwegian, Danish, Finnish
- Romanian, Bulgarian, Greek
Quick User Guide
Basic Workflow
- Place English SDH file in
masters/
folder - Run Mode 4 (TRANS) to generate multiple languages
- Run Mode 6 (CONFORM) to fix partial translations
- Run Mode 9 (BATCH-QC) for quality control
- Use Mode 8 (SRT-NAME) to organize files
- Generate final reports and deliver
Common Use Cases
1. Quick Quality Check
2. Multi-Language Production
3. Language Detection & Organization
Troubleshooting
- Translation fails: Check API key in data/api_config.csv
- Import errors: Ensure Python 3.8+ and dependencies installed
- File not found: Verify file paths and folder structure
- Encoding issues: BabylPhysh auto-detects UTF-8 and Latin-1
Contributing & Support
How to Contribute
- Report bugs via GitHub Issues
- Suggest features in Discussions
- Submit pull requests with improvements
- Help with documentation
- Share your production workflows
Support Resources
Resource | Link |
---|---|
GitHub Repository | github.com/growgoo/Babyl |
Issue Tracker | github.com/growgoo/Babyl/issues |
Documentation | docs/ folder in repository |
Current Branch | Babylphysh-08a3-Enterprise |