Kitta AI API documentation and playground

Text to Speech

Voice Cloning

Lip Sync Video

Other

Text to Speech (HTTP)

Convert text to speech using HTTP API

Text to Speech API

Endpoint

POST /api/open/tts

Request Headers

// JSON Format
Content-Type: application/json
Authorization: Bearer YOUR_API_TOKEN  // API Key

// MessagePack Format
Content-Type: application/msgpack
Authorization: Bearer YOUR_API_TOKEN  // API Key

Request Parameters

{
  "reference_id": string,  // Required, voice model ID
  "text": string,          // Required, text to convert
  "speed": number,         // Optional, speech speed, range: 0.5-2.0, default: 1
  "volume": number,        // Optional, volume, range: -20-20, default: 0
  "version": string,       // Optional, TTS version. Available: "v1", "v2", "s1" (traditional), "v3-turbo", "v3-hd" (v3), default: "v1"
  "format": string,        // Optional, audio format. Available: "mp3", "wav", "pcm", default: "mp3"
  "emotion": string,       // Optional, emotion control (v3 only). Available: "happy","sad","angry","fearful","disgusted","surprised","calm","auto", default: "auto"
  "language": string,      // Optional, language enhancement (v3 only). Available: "auto","zh","en", default: "auto"
  "cache": boolean         // Optional, false returns audio stream, true returns audio URL, default: false
}

Version Notes:

  • Legacy Versions: v1, v2, s1 (basic text-to-speech functionality)
  • V3 Versions: v3-turbo, v3-hd (advanced features including emotion control and language boost)
  • The system will automatically select the corresponding version based on model configuration, no manual specification needed

Response Data

// Success Response (cache=false) - 200
Content-Type: audio/mpeg
<Binary audio data>

// Success Response (cache=true) - 200
Content-Type: application/json
{
  "success": boolean,        // Whether successful
  "audio_url": string,       // Audio file URL
  "format": string,          // Audio format
  "characters_used": number, // Characters used
  "quota_remaining": number  // Remaining API credits
}

// Error Response
{
  "error": string     // Error message
}

CURL Example

# JSON Format - Traditional version (using s1 version, recommended)
curl -X POST https://fishaudio.net/api/open/tts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "reference_id": "your_model_id",
    "text": "Text content to convert",
    "speed": 1.0,
    "volume": 0,
    "version": "s1",
    "format": "mp3",
    "cache": false
  }' \
  --output output.mp3

# JSON Format - V3 model (using HD version, supports emotion control and language enhancement)
curl -X POST https://fishaudio.net/api/open/tts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "reference_id": "your_model_id",
    "text": "Text content to convert",
    "speed": 1.0,
    "volume": 0,
    "version": "v3-hd",
    "emotion": "calm",
    "language": "zh",
    "format": "mp3",
    "cache": false
  }' \
  --output output.mp3

# MessagePack Format (undefined)

Online Debug

Status Code Description

Status Code Description:
200 OK                  - Request successful
400 Bad Request         - Invalid request parameters
401 Unauthorized        - Invalid API Token
403 Forbidden          - Access forbidden
404 Not Found          - Resource not found
413 Payload Too Large  - Upload file too large
429 Too Many Requests  - Rate limit exceeded/Insufficient credits
500 Server Error       - Internal server error

Error Response Format:
{
  "error": string,      // Error message
  "details": string,    // Detailed error message (optional)
  "code": string       // Error code (optional)
}