Speech-to-Text Setup
Configure speech recognition services to enable voice input in Talk Buddy. This guide covers both online and local STT (Speech-to-Text) service options.
Understanding STT Services
What is Speech-to-Text?
STT services convert your spoken words into text that Talk Buddy can process:
- Voice input: Capture what you say during practice conversations
- Real-time processing: Convert speech to text quickly for smooth interaction
- Accuracy: Understand your words correctly for meaningful AI responses
- Multiple languages: Support for various languages and accents (service-dependent)
Service Options
Online Services (Default)
Pre-configured services: Ready to use immediately
- Pros: No setup required, high accuracy, multiple language support
- Cons: Requires internet, potential privacy concerns, usage limits
- Best for: Quick start, testing, occasional practice
Local Services (Recommended for Privacy)
Self-hosted services: Run on your own computer
- Pros: Complete privacy, offline capability, no usage limits
- Cons: Requires setup, uses system resources, may have lower accuracy
- Best for: Regular practice, privacy-conscious users, offline environments
Quick Start (Online Services)
Default Configuration
Talk Buddy comes pre-configured with working STT services:
Check Current Status
- Look at status footer: STT indicator should be green (●)
- If green: You’re ready to practice with voice input
- If red/gray: Follow troubleshooting steps below
Test STT Service
- Go to Settings: Click “Settings” in sidebar
- Find STT section: Look for Speech-to-Text configuration
- Click “Test STT”: Verify service is working
- Speak into microphone: Say a test phrase
- Check results: Verify your speech was recognized correctly
Troubleshooting Online Services
Connection Issues
- Check internet: Verify stable connection
- Firewall settings: Ensure Talk Buddy can access external services
- Service status: Online services may occasionally be unavailable
Microphone Problems
- Permissions: Grant microphone access to Talk Buddy
- Hardware test: Verify microphone works in other applications
- System settings: Check microphone volume and input device
Local STT Setup (Speaches)
Why Use Local Services?
Privacy benefits:
- No data sent to external servers
- Complete offline functionality
- No usage limits or quotas
Performance benefits:
- Faster response times (no network latency)
- Consistent availability
- Customizable models for your use case
Installing Speaches
System Requirements
- Operating System: Windows 10+, macOS 10.14+, Linux (Ubuntu 18.04+)
- RAM: 4GB minimum, 8GB recommended
- Storage: 2-5GB for speech models
- CPU: Modern processor (last 5 years recommended)
Installation Steps
Option 1: Docker Installation (Recommended)
# Pull the Speaches Docker image
docker pull ghcr.io/tts-ai/speaches:latest
# Run Speaches container
docker run -d \
--name speaches \
-p 8000:8000 \
ghcr.io/tts-ai/speaches:latest
Option 2: Python Installation
# Install Python 3.8+ if not already installed
python --version
# Install Speaches via pip
pip install speaches
# Start Speaches server
speaches serve --host 0.0.0.0 --port 8000
Option 3: Binary Installation
- Download: Get binary from Speaches releases
- Extract: Unzip to preferred location
- Run: Execute the binary to start server
- Configure: Set to run on port 8000
Configuring Talk Buddy for Local STT
Update Service URL
- Open Talk Buddy Settings
- Find STT Service URL field
- Change to local address:
http://localhost:8000
- Save settings
Test Local Connection
- Click “Test STT” in settings
- Verify connection: Should show successful connection
- Test speech recognition: Speak a test phrase
- Check status footer: STT indicator should be green
Speaches Configuration Options
Model Selection
Speaches supports multiple STT models:
Fast Models (Lower accuracy, faster processing)
- Good for: Real-time conversation, older hardware
- Model examples:
Systran/faster-whisper-small
, openai/whisper-tiny
Accurate Models (Higher accuracy, slower processing)
- Good for: High-quality transcription, powerful hardware
- Model examples:
Systran/faster-whisper-large-v2
, openai/whisper-large
Language Configuration
Configure for your language:
# Example: Configure for Spanish
speaches serve --language es --port 8000
# Example: Configure for French
speaches serve --language fr --port 8000
Advanced Configuration
Create configuration file speaches.yaml
:
server:
host: "0.0.0.0"
port: 8000
stt:
model: "Systran/faster-whisper-medium"
language: "en"
device: "auto" # auto, cpu, cuda
tts:
enabled: true
model: "speaches-ai/piper-en_US-amy-low"
Advanced STT Configuration
Multiple Service Setup
Backup Services
Configure multiple STT services for redundancy:
- Primary service: Local Speaches for regular use
- Backup service: Online service for when local is unavailable
- Testing: Verify both services work independently
Service Switching
In Talk Buddy settings:
- Change service URL: Switch between local and online
- Test new service: Verify functionality before practice
- Save configurations: Keep both URLs documented for easy switching
Hardware Optimization
For better local STT performance:
- Use SSD storage: Faster model loading
- Increase RAM: Better model caching
- Use GPU acceleration: If supported by your STT service
- Close other applications: Free resources for speech processing
Model Selection
Choose appropriate models:
- Fast models: Real-time conversation priority
- Accurate models: High-quality transcription priority
- Language-specific: Better accuracy for non-English languages
- Domain-specific: Specialized vocabulary (medical, technical)
Security and Privacy
Local Service Security
Secure your local installation:
- Firewall configuration: Only allow local connections
- Network isolation: Keep STT service on local network only
- Regular updates: Maintain current software versions
- Access control: Restrict who can access the service
Data Privacy
Understand data handling:
- Local processing: No data leaves your computer
- No logging: Configure services to not store audio/transcriptions
- Temporary processing: Audio processed in memory only
- User control: You control all data and processing
Troubleshooting STT Issues
Common Problems
Microphone Not Working
Symptoms: No speech detected, silent input
Solutions:
- Check system permissions: Grant microphone access to Talk Buddy
- Test hardware: Verify microphone works in other applications
- Check input device: Ensure correct microphone selected in system settings
- Adjust sensitivity: Increase microphone volume if too quiet
Poor Recognition Accuracy
Symptoms: Speech transcribed incorrectly, frequent mistakes
Solutions:
- Improve audio quality: Use better microphone, reduce background noise
- Speak clearly: Slower, more deliberate speech
- Check language settings: Ensure STT service configured for your language
- Try different model: Some models work better for specific accents/voices
Service Connection Errors
Symptoms: Red STT indicator, connection timeouts
Solutions:
- Verify service running: Check if Speaches or online service is available
- Test network connectivity: Ensure internet access for online services
- Check firewall: Confirm Talk Buddy can access STT service
- Restart services: Stop and start STT service, restart Talk Buddy
Slow Response Times
Symptoms: Long delays between speech and recognition
Solutions:
- Use faster models: Switch to smaller, quicker STT models
- Optimize hardware: Close other applications, upgrade hardware
- Check network: Ensure stable, fast internet for online services
- Local processing: Switch to local STT service for better performance
Advanced Troubleshooting
Log Analysis
Check service logs for errors:
# View Speaches logs
docker logs speaches
# Check system microphone logs (macOS)
log show --predicate 'subsystem == "com.apple.coreaudio"' --last 5m
# Windows microphone troubleshooting
# Use Windows Audio troubleshooter in Settings
Network Diagnostics
Test service connectivity:
# Test local Speaches service
curl http://localhost:8000/health
# Test microphone endpoint
curl -X POST http://localhost:8000/stt \
-H "Content-Type: audio/wav" \
--data-binary @test-audio.wav
Monitor resource usage:
- CPU usage: STT processing should use <50% CPU
- Memory usage: Models require 1-4GB RAM typically
- Network usage: Online services use bandwidth during processing
- Disk I/O: Local models may cause disk activity during loading
Service Comparison
Online vs Local STT
Aspect |
Online Services |
Local Services |
Setup |
Ready immediately |
Requires installation |
Privacy |
Data sent externally |
Complete privacy |
Accuracy |
Often very high |
Varies by model |
Speed |
Network dependent |
Hardware dependent |
Cost |
May have usage limits |
Free after setup |
Offline |
Requires internet |
Works offline |
Languages |
Many supported |
Depends on models |
Recommended Configurations
For Students
- Start with: Default online services
- Upgrade to: Local services if practicing frequently
- Best for: Learning and experimenting
For Teachers
- Recommended: Local services for privacy and reliability
- Classroom use: Local services avoid internet dependency
- Best for: Consistent, private classroom experience
For Professionals
- Recommended: Local services for confidential practice
- Corporate use: Local services meet security requirements
- Best for: Professional development with privacy
Quick Setup Checklist
Online STT (5 minutes)
Local STT (30 minutes)
Troubleshooting (15 minutes)
With proper STT setup, you’ll have accurate voice recognition for natural conversation practice. Choose the option that best fits your privacy needs and technical comfort level! 🎤
Related Guides: