概述
人耳对声音的敏感度随频率的变化而变化,且敏感度与频率之间并不是简单的线性正比关系,而是近似成对数关系。为了更好的逼近人耳的拾音特点,通常将频率进行非线性变换到Mel刻度(或者Bark刻度或者ERB刻度),然后将上述刻度均匀分成等距的M段,然后再反变换到频域。最后设计一组带通滤波器在上述频率段提取特征。这样提取出来的特征对于提升语音识别、语音唤醒等任务很有用。接下来就主要介绍3种常用的刻度--Mel刻度、Bark刻度、以及ERB刻度。
Mel刻度
Mel刻度用于描述人耳对频率感知的非线性变换,频率到Mel值的变换的公式如下
从Mel值变换到频率的公式如下
然后基于上述变换设计一组Mel滤波器用于提取特征。流程如下:
-
选择所需要的Mel滤波器个数M
n_filter = 40
-
将最低频率和截止频率分别转换成Mel值得到low_Mel和high_Mel
def hz2mel(hz):
return 2595 * np.log10(1 + hz / 700.)
def mel2hz(mel):
return 700 * (10 ** (mel / 2595.0) - 1)
f_low = 0
f_high = 20000
low_mel = hz2mel(f_low)
high_mel = hz2mel(f_high)
-
将low_Mel和high_Mel等距分成M等份
mel_points = np.linspace(low_mel, high_mel, n_filter + 2)
-
然后将等分的Mel值反变换到频率,得到频率值
binf = np.floor((nfft+1)*mel2hz(mel_points)/sf)
-
构建三角滤波器
for indexj in range(0, nfilt):
left = binf[indexj]
center = binf[indexj + 1]
right = binf[indexj + 2]
for indexi in range(int(left), int(center)):
fbank[indexj, indexi] = (indexi - left) / ( center - left)
for indexi in range(int(center), int(right)):
fbank[indexj, indexi] = (right - indexi) / ( right -center)
绘制Mel滤波器的完整代码如下
import numpy as np
import matplotlib.pyplot as plt
def hz2mel(hz):
return 2595 * np.log10(1 + hz / 700.)
def mel2hz(mel):
return 700 * (10 ** (mel / 2595.0) - 1)
sf = 8000
f_low = 20
f_high = sf // 2
low_mel = hz2mel(f_low)
high_mel = hz2mel(f_high)
n_filter = 40
nfft = 512
mel_points = np.linspace(low_mel, high_mel, n_filter + 2)
binf = np.floor((nfft + 1) * mel2hz(mel_points) / sf)
fbank = np.zeros([n_filter, nfft // 2 + 1])
df = sf / nfft
w2 = int(nfft / 2 + 1)
freq = []
for n in range(0, w2):
freqs = int(n * df)
freq.append(freqs)
for indexj in range(0, n_filter):
left = binf[indexj]
center = binf[indexj + 1]
right = binf[indexj+2]
for indexi in range(int(binf[indexj]), int(binf[indexj+1])):
indexi] = (indexi - left) / ( center - left)
for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):
indexi] = ( right - indexi) / ( right - center)
fbank[indexj,:])
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()
Bark刻度
Bark刻度是另外一个用于描述人耳对频率感知的非线性。从频率转换到Bark的公式为
逆变换公式为
采用上述同样的步骤,设计一组基于Bark的三角滤波器,代码如下
import numpy as np
import matplotlib.pyplot as plt
def hz2bark(hz):
bark = 26.81 * hz / (1960 + hz) - 0.53
if bark < 2:
bark = bark + 0.15 * (2 - bark)
if bark > 20.1:
bark = bark + 0.22 * (bark - 20.1)
return bark
def bark2hz(bark):
if bark < 2:
bark = (bark - 0.3) / 0.85
if bark > 20.1:
bark = (bark + 4.422) / 1.22
hz = 1960 * ( (bark + 0.53) / (26.28 - bark))
return hz
sf = 8000
f_low = 20
f_high = sf // 2
low_bark = hz2bark(f_low)
high_bark = hz2bark(f_high)
n_filter = 24
nfft = 512
bark_points = np.linspace(low_bark, high_bark, n_filter + 2)
binf = np.zeros(len(bark_points))
for index in range(0, len(bark_points)):
np.floor( (nfft + 1) * bark2hz(bark_points[index]) / sf) =
fbank = np.zeros([n_filter, nfft // 2 + 1])
df = sf / nfft
w2 = int(nfft / 2 + 1)
freq = []
for n in range(0, w2):
freqs = int(n * df)
freq.append(freqs)
for indexj in range(0, n_filter):
left = binf[indexj]
center = binf[indexj + 1]
right = binf[indexj+2]
for indexi in range(int(binf[indexj]), int(binf[indexj+1])):
indexi] = (indexi - left) / ( center - left)
for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):
indexi] = ( right - indexi) / ( right - center)
fbank[indexj,:])
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()
ERB刻度
另外一个描述频率感知非线性变换的关系为等效矩形带宽(equivalent rectangular bandwidth, ERB)。从频率转换成ERB刻度值的计算公式为
逆变换为
同样基于ERB刻度设计三角滤波器,其代码如下
import numpy as np
import matplotlib.pyplot as plt
def hz2erb(hz):
A = 1000 * np.log(10.) / ( 24.7 * 4.37)
erb = A * np.log10(1 + hz * 0.00437)
return erb
def erb2hz(erb):
A = 1000 * np.log(10.) / ( 24.7 * 4.37)
hz = (10 ** (erb / A) - 1) / 0.00437
return hz
sf = 8000
f_low = 20
f_high = sf // 2
low_erb = hz2erb(f_low)
high_erb = hz2erb(f_high)
n_filter = 24
nfft = 512
erb_points = np.linspace(low_erb, high_erb, n_filter + 2)
binf = np.floor((nfft + 1) * erb2hz(erb_points) / sf)
fbank = np.zeros([n_filter, nfft // 2 + 1])
df = sf / nfft
w2 = int(nfft / 2 + 1)
freq = []
for n in range(0, w2):
freqs = int(n * df)
freq.append(freqs)
for indexj in range(0, n_filter):
left = binf[indexj]
center = binf[indexj + 1]
right = binf[indexj+2]
for indexi in range(int(binf[indexj]), int(binf[indexj+1])):
indexi] = (indexi - left) / ( center - left)
for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):
indexi] = ( right - indexi) / ( right - center)
fbank[indexj,:])
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()
https://mp.weixin.qq.com/s/pGwO_27x8ddQF55wTSQlmA