Is the S2 model open-source?

Yes, the full S2-Pro weights, fine-tuning code, and streaming inference engine were released open-source in March 2026.

No, but they reset at the start of each billing month; unused credits do not carry forward.

Is there a team or enterprise option?

Pro plans include built-in team sharing for up to 3 members; larger enterprise needs contact sales for custom concurrency and support.

Fish Audio

Freemium🇺🇸🇺🇸 United StatesLoss-Making

Ultra-realistic AI TTS and instant voice cloning

Tts Voice Clone STT Audio Gen

Overall score

Heat score

Claim Tool Suggest Edits

Pricing

Free$0

Plus$20/month or $5.5/month billed annually ($66/year)

Pro$150/month or $37.5/month billed annually ($450/year)

Purchase Subscription

Technical Specs

Inputs

Text Prompt, Audio File

Outputs

Generated Audio, Cloned Voice, Transcription

AI Type

Text-to-Speech

Model Architecture

Transformer

Daily Prompts

N/A

Context Length

N/A

Output Quality

Accuracy

92%

Content

95%

Reasoning

88%

Company Profile

Company

Fish Audio

Founded

2025

Mountain View, CA, USA

Employees

N/A

Total Raised / Total Funding

N/A

Revenue

N/A

Valuation

N/A

ARR

N/A

CEO

Rissa Cao

Overview

Estimated Paid Users

N/A

Current estimate

Total Earnings Till Date

N/A

+9.52% from last month

Market Share

N/A

Current share

Average Session

N/A

Per active user

Hallucination Rate

Model quality signal

Growth Rate

+12.07%

Monthly active users

Burn Rate

N/A

Total expenses / years active

Paid User Gain

+44.44%

Monthly paid user trend

No demo video available yet.

Platforms

Web App

Available

Profit Analysis

-$13M

Total Loss

$26M

Total Profit

Performance Metrics

Accuracy

92%

Context

95%

Reasoning

88%

Safety

85%

Benchmarks

No benchmark scores available.

Fish Audio Models

S2-Pro

Type: Text-to-Speech

Description: Next-generation expressive TTS with fine-grained word-level control using natural language tags, Dual-AR architecture on Qwen3 backbone, open-source

Architecture: Transformer

Type: Text-to-Speech

Description: High-quality 4B-parameter emotional TTS with parenthesis syntax for control

Architecture: Transformer

S1-mini

Type: Text-to-Speech

Description: Distilled 0.5B open-source model delivering core emotional and tone capabilities

Architecture: Transformer

Funding Rounds & Investors

Total Funding

N/A

Rounds

No funding rounds available.

Founders/Team

Rissa Cao

Co-Founder & CEO

Shijia Liao

Co-Founder & Chief Scientist

Direct competitors

No direct competitors available.

Change Log / Major Updates

2025 · Nov 15

Platform officially rebranded to Fish Audio with enhanced web app and Story Studio features.

2026 · Mar 9

Next-generation TTS with inline natural-language emotion tags, multi-speaker support, 80+ languages, and full open-source release on Qwen3 backbone.

2026 · Mar 18

Built-in team sharing for Pro users and expanded credit pools for collaborative workflows.

Compliance, Integrations & Support

Industry: Not specified

Compliances: Not specified

Integrations: Python SDK, Node.js SDK, REST API, Hugging Face, GitHub

Support:email, help center

Target audience: Content Creators, Video Editors, Podcasters, Game Developers, Educators, Marketers, Enterprises, Audiobook Narrators, YouTubers, App Builders

Supported languages: English, Chinese, Japanese, Korean, French, German, Arabic, Spanish

Fish Audio Acquisitions

No acquisition records available.

Reviews & Rating

0 reviews

No reviews yet

Be the first to share how Fish Audio performs for your workflow.

0.0

Accuracy

0.0

Ease of Use

0.0

Output Quality

0.0

Security

0.0

Social Feed

No social feed available for this tool yet.

More About Fish Audio

Fish Audio launched in early 2025 as the commercial evolution of open-source speech projects, quickly rising to challenge established players by offering studio-quality text-to-speech and voice cloning at a fraction of the cost.

Powered by its proprietary S1 and groundbreaking S2 models, the platform stands out for word-level emotional control using plain-language tags, sub-150ms latency, and native support across 30+ languages—capabilities that have driven explosive adoption among creators and enterprises alike.

Instant cloning from just 10 seconds of audio
Inline emotion and prosody directives without fixed token sets
Open-source model weights and inference engine for self-hosting
Pay-as-you-go API alongside generous credit-based web plans

Fish Audio FAQ's

What is the difference between the Free and paid plans?

The Free tier offers 7 minutes of high-quality generation per month for personal use only. Paid plans (Plus and Pro) unlock commercial rights, far higher monthly minutes, priority processing, private voices, and full API access.

How much audio does one credit generate?

Roughly 600-625 credits equal one minute of S1 or S2 generation. Credits reset monthly and do not roll over.

Can I use Fish Audio for commercial projects?

Yes, but only on Plus or Pro plans. The Free tier is restricted to personal, non-commercial use.

Does Fish Audio support voice cloning?

Yes — instant cloning from as little as 10 seconds of audio, with enhanced fidelity on paid plans.

What languages are supported?

30+ languages including English, Chinese, Japanese, Korean, French, German, Arabic, and Spanish, with S2 expanding coverage further.