talk-buddy

Speech-to-Text Setup

Configure speech recognition services to enable voice input in Talk Buddy. This guide covers both online and local STT (Speech-to-Text) service options.

Understanding STT Services

What is Speech-to-Text?

STT services convert your spoken words into text that Talk Buddy can process:

Voice input: Capture what you say during practice conversations
Real-time processing: Convert speech to text quickly for smooth interaction
Accuracy: Understand your words correctly for meaningful AI responses
Multiple languages: Support for various languages and accents (service-dependent)

Service Options

Online Services (Default)

Pre-configured services: Ready to use immediately

Pros: No setup required, high accuracy, multiple language support
Cons: Requires internet, potential privacy concerns, usage limits
Best for: Quick start, testing, occasional practice

Local Services (Recommended for Privacy)

Self-hosted services: Run on your own computer

Pros: Complete privacy, offline capability, no usage limits
Cons: Requires setup, uses system resources, may have lower accuracy
Best for: Regular practice, privacy-conscious users, offline environments

Quick Start (Online Services)

Default Configuration

Talk Buddy comes pre-configured with working STT services:

Check Current Status

Look at status footer: STT indicator should be green (●)
If green: You’re ready to practice with voice input
If red/gray: Follow troubleshooting steps below

Test STT Service

Go to Settings: Click “Settings” in sidebar
Find STT section: Look for Speech-to-Text configuration
Click “Test STT”: Verify service is working
Speak into microphone: Say a test phrase
Check results: Verify your speech was recognized correctly

Troubleshooting Online Services

Connection Issues

Check internet: Verify stable connection
Firewall settings: Ensure Talk Buddy can access external services
Service status: Online services may occasionally be unavailable

Microphone Problems

Permissions: Grant microphone access to Talk Buddy
Hardware test: Verify microphone works in other applications
System settings: Check microphone volume and input device

Local STT Setup (Speaches)

Why Use Local Services?

Privacy benefits:

No data sent to external servers
Complete offline functionality
No usage limits or quotas

Performance benefits:

Faster response times (no network latency)
Consistent availability
Customizable models for your use case

Installing Speaches

System Requirements

Operating System: Windows 10+, macOS 10.14+, Linux (Ubuntu 18.04+)
RAM: 4GB minimum, 8GB recommended
Storage: 2-5GB for speech models
CPU: Modern processor (last 5 years recommended)

Installation Steps

Option 1: Docker Installation (Recommended)

# Pull the Speaches Docker image
docker pull ghcr.io/tts-ai/speaches:latest

# Run Speaches container
docker run -d \
  --name speaches \
  -p 8000:8000 \
  ghcr.io/tts-ai/speaches:latest

Option 2: Python Installation

# Install Python 3.8+ if not already installed
python --version

# Install Speaches via pip
pip install speaches

# Start Speaches server
speaches serve --host 0.0.0.0 --port 8000

Option 3: Binary Installation

Download: Get binary from Speaches releases
Extract: Unzip to preferred location
Run: Execute the binary to start server
Configure: Set to run on port 8000

Configuring Talk Buddy for Local STT

Update Service URL

Open Talk Buddy Settings
Find STT Service URL field
Change to local address: http://localhost:8000
Save settings

Test Local Connection

Click “Test STT” in settings
Verify connection: Should show successful connection
Test speech recognition: Speak a test phrase
Check status footer: STT indicator should be green

Speaches Configuration Options

Model Selection

Speaches supports multiple STT models:

Fast Models (Lower accuracy, faster processing)

Good for: Real-time conversation, older hardware
Model examples: Systran/faster-whisper-small, openai/whisper-tiny

Accurate Models (Higher accuracy, slower processing)

Good for: High-quality transcription, powerful hardware
Model examples: Systran/faster-whisper-large-v2, openai/whisper-large

Language Configuration

Configure for your language:

# Example: Configure for Spanish
speaches serve --language es --port 8000

# Example: Configure for French
speaches serve --language fr --port 8000

Advanced Configuration

Create configuration file speaches.yaml:

server:
  host: "0.0.0.0"
  port: 8000
  
stt:
  model: "Systran/faster-whisper-medium"
  language: "en"
  device: "auto"  # auto, cpu, cuda
  
tts:
  enabled: true
  model: "speaches-ai/piper-en_US-amy-low"

Advanced STT Configuration

Multiple Service Setup

Backup Services

Configure multiple STT services for redundancy:

Primary service: Local Speaches for regular use
Backup service: Online service for when local is unavailable
Testing: Verify both services work independently

Service Switching

In Talk Buddy settings:

Change service URL: Switch between local and online
Test new service: Verify functionality before practice
Save configurations: Keep both URLs documented for easy switching

Performance Optimization

Hardware Optimization

For better local STT performance:

Use SSD storage: Faster model loading
Increase RAM: Better model caching
Use GPU acceleration: If supported by your STT service
Close other applications: Free resources for speech processing

Model Selection

Choose appropriate models:

Fast models: Real-time conversation priority
Accurate models: High-quality transcription priority
Language-specific: Better accuracy for non-English languages
Domain-specific: Specialized vocabulary (medical, technical)

Security and Privacy

Local Service Security

Secure your local installation:

Firewall configuration: Only allow local connections
Network isolation: Keep STT service on local network only
Regular updates: Maintain current software versions
Access control: Restrict who can access the service

Data Privacy

Understand data handling:

Local processing: No data leaves your computer
No logging: Configure services to not store audio/transcriptions
Temporary processing: Audio processed in memory only
User control: You control all data and processing

Troubleshooting STT Issues

Common Problems

Microphone Not Working

Symptoms: No speech detected, silent input Solutions:

Check system permissions: Grant microphone access to Talk Buddy
Test hardware: Verify microphone works in other applications
Check input device: Ensure correct microphone selected in system settings
Adjust sensitivity: Increase microphone volume if too quiet

Poor Recognition Accuracy

Symptoms: Speech transcribed incorrectly, frequent mistakes Solutions:

Improve audio quality: Use better microphone, reduce background noise
Speak clearly: Slower, more deliberate speech
Check language settings: Ensure STT service configured for your language
Try different model: Some models work better for specific accents/voices

Service Connection Errors

Symptoms: Red STT indicator, connection timeouts Solutions:

Verify service running: Check if Speaches or online service is available
Test network connectivity: Ensure internet access for online services
Check firewall: Confirm Talk Buddy can access STT service
Restart services: Stop and start STT service, restart Talk Buddy

Slow Response Times

Symptoms: Long delays between speech and recognition Solutions:

Use faster models: Switch to smaller, quicker STT models
Optimize hardware: Close other applications, upgrade hardware
Check network: Ensure stable, fast internet for online services
Local processing: Switch to local STT service for better performance

Advanced Troubleshooting

Log Analysis

Check service logs for errors:

# View Speaches logs
docker logs speaches

# Check system microphone logs (macOS)
log show --predicate 'subsystem == "com.apple.coreaudio"' --last 5m

# Windows microphone troubleshooting
# Use Windows Audio troubleshooter in Settings

Network Diagnostics

Test service connectivity:

# Test local Speaches service
curl http://localhost:8000/health

# Test microphone endpoint
curl -X POST http://localhost:8000/stt \
  -H "Content-Type: audio/wav" \
  --data-binary @test-audio.wav

Performance Monitoring

Monitor resource usage:

CPU usage: STT processing should use <50% CPU
Memory usage: Models require 1-4GB RAM typically
Network usage: Online services use bandwidth during processing
Disk I/O: Local models may cause disk activity during loading

Service Comparison

Online vs Local STT

Aspect	Online Services	Local Services
Setup	Ready immediately	Requires installation
Privacy	Data sent externally	Complete privacy
Accuracy	Often very high	Varies by model
Speed	Network dependent	Hardware dependent
Cost	May have usage limits	Free after setup
Offline	Requires internet	Works offline
Languages	Many supported	Depends on models

Recommended Configurations

For Students

Start with: Default online services
Upgrade to: Local services if practicing frequently
Best for: Learning and experimenting

For Teachers

Recommended: Local services for privacy and reliability
Classroom use: Local services avoid internet dependency
Best for: Consistent, private classroom experience

For Professionals

Recommended: Local services for confidential practice
Corporate use: Local services meet security requirements
Best for: Professional development with privacy

Quick Setup Checklist

Online STT (5 minutes)

Open Talk Buddy
Check STT status indicator (should be green)
Go to Settings and test STT service
Grant microphone permissions if prompted
Test with sample speech

Local STT (30 minutes)

Install Speaches (Docker, Python, or binary)
Start Speaches service on port 8000
Configure Talk Buddy to use localhost:8000
Test connection in Settings
Verify speech recognition works
Configure for automatic startup (optional)

Troubleshooting (15 minutes)

Check microphone permissions and hardware
Verify service connectivity (green status indicator)
Test with simple, clear speech
Check network/firewall settings if needed
Review logs for error messages

With proper STT setup, you’ll have accurate voice recognition for natural conversation practice. Choose the option that best fits your privacy needs and technical comfort level! 🎤

Related Guides:

TTS Setup - Configure text-to-speech output
AI Model Integration - Set up conversation AI
Connection Issues - Fix connectivity problems