本文所用文件的链接

链接:https://pan.baidu.com/s/1RWNVHuXMQleOrEi5vig_bQ
提取码:p57s

语音识别

语音识别可以实现通过一段音频信息(wav波) 识别出音频的内容.

通过傅里叶变换, 可以将时间域的声音分解为一系列不同频率的正弦函数的叠加. 通过频率谱线的特殊分布, 建立音频内容与文本之间的对应关系, 以此作为模型训练的基础.

语音识别

梅尔频率倒谱系数(MFCC) 描述了与声音内容密切相关的13个特殊频率所对应的能量分布. 那么我们就可以使用梅尔频率倒谱系数(MFCC)矩阵作为语音识别的特征. 基于隐马尔科夫模型进行模式识别, 找到测试样本最匹配的声音模型, 从而识别语音内容.

  1. 准备多个声音样本作为训练数据. 并且为每个音频都标明其类别.
  2. 读取每一个音频文件, 获取音频文件的mfcc矩阵.
  3. 以mfcc作为训练样本, 进行训练.
  4. 对测试样本进行测试. (基于隐马模型)

MFCC相关API:

import scipy.io.wavfile as wf
import python_speech_features as sf

sample_rate, sigs = wf.read('../xx.wav')
mfcc = sf.mfcc(sigs, sample_rate)

案例: MFCC提取

"""
MFCC提取
"""
import scipy.io.wavfile as wf
import python_speech_features as sf
import matplotlib.pyplot as mp

sample_rate, sigs=wf.read(
	'../ml_data/filter.wav')
mfcc = sf.mfcc(sigs, sample_rate)
print(mfcc.shape)

mp.matshow(mfcc.T, cmap='gist_rainbow')
mp.title('MFCC')
mp.ylabel('Features', fontsize=14)
mp.xlabel('Samples', fontsize=14)
mp.tick_params(labelsize=10)
mp.show()

python获取麦克风声音并进行语音识别 python语音特征提取_ci


隐马尔科夫模型相关API:

import hmmlearn.hmm as hl
# 构建隐马模型 
# n_components: 用几个高斯函数拟合样本数据
# covariance_type:使用相关矩阵辅对角线进行相关性比较
# n_iter: 最大迭代上限
model = hl.GaussianHMM(
    n_components=4, 
    covariance_type='diag', 
	n_iter=1000)
model.fit(mfccs)
# 通过训练好的隐马模型  验证音频mfcc的得分 
# 匹配度越好, 得分越高
score = model.score(test_mfcc)

案例:

"""
语音识别
"""
import os 
import numpy as np
import scipy.io.wavfile as wf
import python_speech_features as sf
import hmmlearn.hmm as hl

def search_files(directory):
	directory = os.path.normpath(directory)

	# {'apple':[dir,dir,dir], 'banana':[dir..]}
	objects = {}
	#当前目录, 当前目录子目录, 文件列表
	for curdir,subdirs,files in \
					os.walk(directory):
		for file in files:
			if file.endswith('.wav'):
				label = curdir.split(os.path.sep)[-1]
				if label not in objects:
					objects[label] = []
				path = os.path.join(curdir, file)
				objects[label].append(path)
	return objects


train_samples = \
	search_files('../ml_data/speeches/training')

# 整理训练集, 把每一个类别中的音频的mfcc
# 摞在一起, 基于隐马模型开始训练.
train_x, train_y = [], []
for label, filenames in train_samples.items():
	mfccs = np.array([])
	for filename in filenames:
		sample_rate, sigs = wf.read(filename)
		mfcc = sf.mfcc(sigs, sample_rate)
		if len(mfccs) == 0:
			mfccs = mfcc
		else:
			mfccs = np.append(mfccs, mfcc, axis=0)
	train_x.append(mfccs)
	train_y.append(label)

# 基于隐马模型进行训练, 把所有类别的模型都存起来
# 一共7个类别循环7次
models = {}
for mfccs, label in zip(train_x, train_y):
	model = hl.GaussianHMM(n_components=4, 
		covariance_type='diag', n_iter=1000)
	models[label] = model.fit(mfccs)

# 读取测试集中的文件, 使用每个模型对文件进行
# 评分, 取分值大的模型对应的label作为预测类别
test_samples = \
	search_files('../ml_data/speeches/testing')

# 整理测试集, 提取每一个文件的mfcc
test_x, test_y = [], []
for label, filenames in test_samples.items():
	mfccs = np.array([])
	for filename in filenames:
		sample_rate, sigs = wf.read(filename)
		mfcc = sf.mfcc(sigs, sample_rate)
		if len(mfccs) == 0:
			mfccs = mfcc
		else:
			mfccs = np.append(mfccs, mfcc, axis=0)
	test_x.append(mfccs)
	test_y.append(label)

# 使用7个模型, 对每一个文件进行预测得分.
pred_test_y = []
# test_x一共7个样本, 遍历7次, 每次验证1个文件
for mfccs in test_x:
	best_score, best_label = None, None
	for label, model in models.items():
		score = model.score(mfccs)
		if (best_score is None) or \
					(best_score < score):
			best_score, best_label=score,label
	pred_test_y.append(best_label)

print(test_y)
print(pred_test_y)