【AI】文本转语音模型ChatTTS体验

原创

Arthur古德曼 2024-05-31 15:42:41 博主文章分类：AI ©著作权

文章标签 ChatTTS AI 模型语音夏明亮 文章分类 PyTorch 人工智能

©著作权归作者所有：来自51CTO博客作者Arthur古德曼的原创作品，请联系作者获取转载授权，否则将追究法律责任

ChatTTS的Python体验方法

2024/05/31

是啥？特点

ChatTTS是专门为对话场景设计的文本转语音模型；具有：

对话式 TTS : ChatTTS针对对话式任务进行了优化，实现了自然流畅的语音合成，同时支持多说话人。

细粒度控制 : 该模型能够预测和控制细粒度的韵律特征，包括笑声、停顿和插入词等。

更好的韵律 : ChatTTS在韵律方面超越了大部分开源TTS模型。同时提供预训练模型，支持进一步的研究。

使用python体验：

安装python（必须，简单略过）
创建体验使用的文件夹（建议，简单略过）
例如创建D:\mychattts
进入体验文件夹，创建venv，并激活（建议操作，简单略过）

PS D:\my_chatttts> python -m venv venv
PS D:\my_chatttts>
PS D:\my_chatttts> .\venv\Scripts\activate
(venv) PS D:\my_chatttts>

pip安装chattts-fork（必须）

(venv) PS D:\my_chatttts> pip install chattts-fork
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Collecting chattts-fork
  Downloading http://mirrors.aliyun.com/pypi/packages/0d/39/d05e7d034b9aa2039cb89daf8fb8b4f15e6e1484542d8f5f6dd51b7dfa55/chattts_fork-0.0.3-py3-none-any.whl (23 kB)
Collecting omegaconf~=2.3.0
  Downloading http://mirrors.aliyun.com/pypi/packages/e3/94/1843518e420fa3ed6919835845df698c7e27e183cb997394e4a670973a65/omegaconf-2.3.0-py3-none-any.whl (79 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 887.9 kB/s eta 0:00:00
Collecting torch~=2.0
  Downloading http://mirrors.aliyun.com/pypi/packages/2a/b7/a3cf5fd40334b9785cc83ee0c96b50603026eb3aa70210a33729018e7029/torch-2.3.0-cp311-cp311-win_amd64.whl (159.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.8/159.8 MB 831.8 kB/s eta 0:00:00
Collecting tqdm
  Downloading http://mirrors.aliyun.com/pypi/packages/18/eb/fdb7eb9e48b7b02554e1664afd3bd3f117f6b6d6c5881438a0b055554f9b/tqdm-4.66.4-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 1.1 MB/s eta 0:00:00
Collecting einops
  Downloading http://mirrors.aliyun.com/pypi/packages/44/5a/f0b9ad6c0a9017e62d4735daaeb11ba3b6c009d69a26141b258cd37b5588/einops-0.8.0-py3-none-any.whl (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.2/43.2 kB 701.8 kB/s eta 0:00:00
Collecting vector-quantize-pytorch
  Downloading http://mirrors.aliyun.com/pypi/packages/e9/af/29bc7483238a001d31d99290ab5e6becf3488eb77e1f8323f0c51ecfbe7c/vector_quantize_pytorch-1.14.24-py3-none-any.whl (36 kB)
Collecting transformers~=4.41.1
  Downloading http://mirrors.aliyun.com/pypi/packages/79/e1/dcba5ba74392015ceeababf3455138f5875202e66e3316d7ca223bdb7b1c/transformers-4.41.1-py3-none-any.whl (9.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.1/9.1 MB 1.1 MB/s eta 0:00:00
Collecting vocos
  Downloading http://mirrors.aliyun.com/pypi/packages/0a/45/82fe9b5696eb5dd4f84632f75b549b48bed0c33a5920b6309fbafd7e3477/vocos-0.1.0-py3-none-any.whl (24 kB)
Collecting antlr4-python3-runtime==4.9.*
  Downloading http://mirrors.aliyun.com/pypi/packages/3e/38/7859ff46355f76f8d19459005ca000b6e7012f2f1ca597746cbcd1fbfe5e/antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 854.4 kB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting PyYAML>=5.1.0
  Downloading http://mirrors.aliyun.com/pypi/packages/b3/34/65bb4b2d7908044963ebf614fe0fdb080773fc7030d7e39c8d3eddcd4257/PyYAML-6.0.1-cp311-cp311-win_amd64.whl (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.7/144.7 kB 1.2 MB/s eta 0:00:00
Collecting filelock
  Downloading http://mirrors.aliyun.com/pypi/packages/41/24/0b023b6537dfc9bae2c779353998e3e99ac7dfff4222fc6126650e93c3f3/filelock-3.14.0-py3-none-any.whl (12 kB)
Collecting typing-extensions>=4.8.0
  Downloading http://mirrors.aliyun.com/pypi/packages/e1/4d/d612de852a0bc64a64418e1cef25fe1914c5b1611e34cc271ed7e36174c8/typing_extensions-4.12.0-py3-none-any.whl (37 kB)
Collecting sympy
  Downloading http://mirrors.aliyun.com/pypi/packages/61/53/e18c8c97d0b2724d85c9830477e3ebea3acf1dcdc6deb344d5d9c93a9946/sympy-1.12.1-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 1.1 MB/s eta 0:00:00
Collecting networkx
  Downloading http://mirrors.aliyun.com/pypi/packages/38/e9/5f72929373e1a0e8d142a130f3f97e6ff920070f87f91c4e13e40e0fba5a/networkx-3.3-py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 1.2 MB/s eta 0:00:00
Collecting jinja2
  Downloading http://mirrors.aliyun.com/pypi/packages/31/80/3a54838c3fb461f6fec263ebf3a3a41771bd05190238de3486aae8540c36/jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 984.2 kB/s eta 0:00:00
Collecting fsspec
  Downloading http://mirrors.aliyun.com/pypi/packages/ba/a3/16e9fe32187e9c8bc7f9b7bcd9728529faa725231a0c96f2f98714ff2fc5/fsspec-2024.5.0-py3-none-any.whl (316 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 316.1/316.1 kB 1.2 MB/s eta 0:00:00
Collecting mkl<=2021.4.0,>=2021.1.1
  Downloading http://mirrors.aliyun.com/pypi/packages/fe/1c/5f6dbf18e8b73e0a5472466f0ea8d48ce9efae39bd2ff38cebf8dce61259/mkl-2021.4.0-py2.py3-none-win_amd64.whl (228.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.5/228.5 MB 642.1 kB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.23.0
  Downloading http://mirrors.aliyun.com/pypi/packages/78/71/6ce4136149cb42b98599d49c39b3a39dd6858b5f9307490998c40e26a51e/huggingface_hub-0.23.2-py3-none-any.whl (401 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 401.7/401.7 kB 1.1 MB/s eta 0:00:00
Collecting numpy>=1.17
  Downloading http://mirrors.aliyun.com/pypi/packages/3f/6b/5610004206cf7f8e7ad91c5a85a8c71b2f2f8051a0c0c4d5916b76d6cbb2/numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.8/15.8 MB 1.0 MB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading http://mirrors.aliyun.com/pypi/packages/49/df/1fceb2f8900f8639e278b056416d49134fb8d84c5942ffaa01ad34782422/packaging-24.0-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.5/53.5 kB 920.2 kB/s eta 0:00:00
Collecting regex!=2019.12.17
  Downloading http://mirrors.aliyun.com/pypi/packages/ef/9b/0aa55fc101c803869c13b389b718b15810592d2df35b1af15ff5b6f48e16/regex-2024.5.15-cp311-cp311-win_amd64.whl (268 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 269.0/269.0 kB 1.1 MB/s eta 0:00:00
Collecting requests
  Downloading http://mirrors.aliyun.com/pypi/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 868.1 kB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19
  Downloading http://mirrors.aliyun.com/pypi/packages/65/8e/6d7d72b28f22c422cff8beae10ac3c2e4376b9be721ef8167b7eecd1da62/tokenizers-0.19.1-cp311-none-win_amd64.whl (2.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 1.1 MB/s eta 0:00:00
Collecting safetensors>=0.4.1
  Downloading http://mirrors.aliyun.com/pypi/packages/cb/f6/19f268662be898ff2a23ac06f8dd0d2956b2ecd204c96e1ee07ba292c119/safetensors-0.4.3-cp311-none-win_amd64.whl (287 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.3/287.3 kB 986.0 kB/s eta 0:00:00
Collecting colorama
  Downloading http://mirrors.aliyun.com/pypi/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting einx>=0.2.2
  Downloading http://mirrors.aliyun.com/pypi/packages/08/b7/69d8d5a187fa8d86dec7357d63fbd36eaf9cf3f5e62adc169148d569384b/einx-0.2.2-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.6/101.6 kB 971.6 kB/s eta 0:00:00
Collecting torchaudio
  Downloading http://mirrors.aliyun.com/pypi/packages/5d/35/8100a33b616292662de330b2cca2c121d798aece4dad59571156b8cffd33/torchaudio-2.3.0-cp311-cp311-win_amd64.whl (2.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 1.0 MB/s eta 0:00:00
Collecting scipy
  Downloading http://mirrors.aliyun.com/pypi/packages/4a/48/4513a1a5623a23e95f94abd675ed91cfb19989c58e9f6f7d03990f6caf3d/scipy-1.13.1-cp311-cp311-win_amd64.whl (46.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.2/46.2 MB 1.0 MB/s eta 0:00:00
Collecting encodec==0.1.1
  Downloading http://mirrors.aliyun.com/pypi/packages/62/59/e47bbd0542d0e6f4ce9983d5eb458a01d4b42c81e5c410cb9e159b1061ae/encodec-0.1.1.tar.gz (3.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 1.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting frozendict
  Downloading http://mirrors.aliyun.com/pypi/packages/6a/71/3656c00606e75e81f11721e6a1c973c3e03da8c7d8b665d20f78245384c6/frozendict-2.4.4-py311-none-any.whl (16 kB)
Collecting intel-openmp==2021.*
  Downloading http://mirrors.aliyun.com/pypi/packages/6f/21/b590c0cc3888b24f2ac9898c41d852d7454a1695fbad34bee85dba6dc408/intel_openmp-2021.4.0-py2.py3-none-win_amd64.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 1.1 MB/s eta 0:00:00
Collecting tbb==2021.*
  Downloading http://mirrors.aliyun.com/pypi/packages/7b/2d/1e1c70fae8ace27e6200fb71c2372a9aeac2baba474b1609d7d466e969b4/tbb-2021.12.0-py3-none-win_amd64.whl (286 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 286.4/286.4 kB 1.1 MB/s eta 0:00:00
Collecting MarkupSafe>=2.0
  Downloading http://mirrors.aliyun.com/pypi/packages/b7/a2/c78a06a9ec6d04b3445a949615c4c7ed86a0b2eb68e44e7541b9d57067cc/MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Collecting charset-normalizer<4,>=2
  Downloading http://mirrors.aliyun.com/pypi/packages/57/ec/80c8d48ac8b1741d5b963797b7c0c869335619e13d4744ca2f67fc11c6fc/charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.9/99.9 kB 1.1 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading http://mirrors.aliyun.com/pypi/packages/e5/3e/741d8c82801c347547f8a2a06aa57dbb1992be9e948df2ea0eda2c8b79e8/idna-3.7-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/66.8 kB 1.2 MB/s eta 0:00:00
Collecting urllib3<3,>=1.21.1
  Downloading http://mirrors.aliyun.com/pypi/packages/a2/73/a68704750a7679d0b6d3ad7aa8d4da8e14e151ae82e6fee774e6e0d05ec8/urllib3-2.2.1-py3-none-any.whl (121 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.1/121.1 kB 886.6 kB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading http://mirrors.aliyun.com/pypi/packages/ba/06/a07f096c664aeb9f01624f858c3add0a4e913d6c96257acb4fce61e7de14/certifi-2024.2.2-py3-none-any.whl (163 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 163.8/163.8 kB 1.1 MB/s eta 0:00:00
Collecting mpmath<1.4.0,>=1.1.0
  Downloading http://mirrors.aliyun.com/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 1.0 MB/s eta 0:00:00
Installing collected packages: tbb, mpmath, intel-openmp, antlr4-python3-runtime, urllib3, typing-extensions, sympy, safetensors, regex, PyYAML, packaging, numpy, networkx, mkl, MarkupSafe, idna, fsspec, frozendict, filelock, einops, colorama, charset-normalizer, certifi, tqdm, scipy, requests, omegaconf, jinja2, einx, torch, huggingface-hub, vector-quantize-pytorch, torchaudio, tokenizers, transformers, encodec, vocos, chattts-fork
  DEPRECATION: antlr4-python3-runtime is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for antlr4-python3-runtime ... done
  DEPRECATION: encodec is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for encodec ... done
Successfully installed MarkupSafe-2.1.5 PyYAML-6.0.1 antlr4-python3-runtime-4.9.3 certifi-2024.2.2 charset-normalizer-3.3.2 chattts-fork-0.0.3 colorama-0.4.6 einops-0.8.0 einx-0.2.2 encodec-0.1.1 filelock-3.14.0 frozendict-2.4.4 fsspec-2024.5.0 huggingface-hub-0.23.2 idna-3.7 intel-openmp-2021.4.0 jinja2-3.1.4 mkl-2021.4.0 mpmath-1.3.0 networkx-3.3 numpy-1.26.4 omegaconf-2.3.0 packaging-24.0 regex-2024.5.15 requests-2.32.3 safetensors-0.4.3 scipy-1.13.1 sympy-1.12.1 tbb-2021.12.0 tokenizers-0.19.1 torch-2.3.0 torchaudio-2.3.0 tqdm-4.66.4 transformers-4.41.1 typing-extensions-4.12.0 urllib3-2.2.1 vector-quantize-pytorch-1.14.24 vocos-0.1.0

[notice] A new release of pip available: 22.3 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip
(venv) PS D:\my_chatttts>

5.使用

(venv) PS D:\my_chatttts> chattts '你好，你是谁啊？'
INFO:ChatTTS.core:Download from HF: https://huggingface.co/2Noise/ChatTTS
config/decoder.yaml: 100%|████████████████████████████████████████████████████████████████████| 117/117 [00:00<?, ?B/s]
config/gpt.yaml: 100%|████████████████████████████████████████████████████████████████████████| 346/346 [00:00<?, ?B/s]
spk_stat.pt: 100%|████████████████████████████████████████████████████████████████| 4.26k/4.26k [00:00<00:00, 2.14MB/s]
config/path.yaml: 100%|███████████████████████████████████████████████████████████████████████| 309/309 [00:00<?, ?B/s]
config/dvae.yaml: 100%|███████████████████████████████████████████████████████████████████████| 143/143 [00:00<?, ?B/s]
config/vocos.yaml: 100%|██████████████████████████████████████████████████████████████████████| 460/460 [00:00<?, ?B/s]
tokenizer.pt: 100%|██████████████████████████████████████████████████████████████████| 337k/337k [00:02<00:00, 163kB/s]
DVAE.pt: 100%|█████████████████████████████████████████████████████████████████████| 27.7M/27.7M [00:35<00:00, 781kB/s]
Vocos.pt: 100%|████████████████████████████████████████████████████████████████████| 54.4M/54.4M [01:36<00:00, 562kB/s]
Decoder.pt: 100%|███████████████████████████████████████████████████████████████████| 104M/104M [01:43<00:00, 1.00MB/s]
Fetching 11 files:  18%|███████████▍                                                   | 2/11 [10:24<46:48, 312.05s/it]
The model maybe broke will load again███████████████████████████████████████████████| 104M/104M [01:43<00:00, 1.23MB/s]
INFO:ChatTTS.core:Download from HF: https://huggingface.co/2Noise/ChatTTS██████████| 54.4M/54.4M [01:36<00:00, 779kB/s]
GPT.pt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 901M/901M [00:53<00:00, 3.12MB/s]
Fetching 11 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:55<00:00,  5.06s/it]
WARNING:ChatTTS.utils.gpu_utils:No GPU found, use CPU instead
INFO:ChatTTS.core:use cpu
INFO:ChatTTS.core:vocos loaded.
INFO:ChatTTS.core:dvae loaded.
INFO:ChatTTS.core:gpt loaded.
INFO:ChatTTS.core:decoder loaded.
INFO:ChatTTS.core:tokenizer loaded.
INFO:ChatTTS.core:All initialized.
GPT.pt:  81%|█████████████████████████████████████████████████████████▊             | 734M/901M [11:33<02:37, 1.06MB/s]
INFO:ChatTTS.core:All initialized.
  3%|███▍                                                                                                         | 12/384 [00:03<01:43,  3.58it/s]
  4%|████▍                                                                                                       | 83/2048 [00:08<03:09, 10.37it/s]
Generate Done for file tts.wav
(venv) PS D:\my_chatttts>

6.语音文件

生成的文件在我们创建的体验文件根目录下，名字叫：tts.wav

示例

基本用法示例：

import ChatTTS
from IPython.display import Audio

chat = ChatTTS.Chat()
chat.load_models()

# 定义语音的文字内容
texts = ["<PUT YOUR TEXT HERE>",]

wavs = chat.infer(texts, use_decoder=True)
Audio(wavs[0], rate=24_000, autoplay=True)

高级用法示例：

###################################
# Sample a speaker from Gaussian.
import torch
std, mean = torch.load('ChatTTS/asset/spk_stat.pt').chunk(2)
rand_spk = torch.randn(768) * std + mean

params_infer_code = {
  'spk_emb': rand_spk, # add sampled speaker 
  'temperature': .3, # using custom temperature
  'top_P': 0.7, # top P decode
  'top_K': 20, # top K decode
}

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7) 
# to generate special token in text to synthesize.
params_refine_text = {
  'prompt': '[oral_2][laugh_0][break_6]'
} 

wav = chat.infer("<PUT YOUR TEXT HERE>", params_refine_text=params_refine_text, params_infer_code=params_infer_code)

###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wav = chat.infer(text, skip_refine_text=True, params_infer_code=params_infer_code)