概述

 

    人耳对声音的敏感度随频率的变化而变化,且敏感度与频率之间并不是简单的线性正比关系,而是近似成对数关系。为了更好的逼近人耳的拾音特点,通常将频率进行非线性变换到Mel刻度(或者Bark刻度或者ERB刻度),然后将上述刻度均匀分成等距的M段,然后再反变换到频域。最后设计一组带通滤波器在上述频率段提取特征。这样提取出来的特征对于提升语音识别、语音唤醒等任务很有用。接下来就主要介绍3种常用的刻度--Mel刻度、Bark刻度、以及ERB刻度。

 

Mel刻度

 

Mel刻度用于描述人耳对频率感知的非线性变换,频率到Mel值的变换的公式如下

Mel、Bark以及ERB介绍_java

从Mel值变换到频率的公式如下

Mel、Bark以及ERB介绍_java_02

然后基于上述变换设计一组Mel滤波器用于提取特征。流程如下:

  • 选择所需要的Mel滤波器个数M

  •  
n_filter = 40
 
  • 将最低频率和截止频率分别转换成Mel值得到low_Mel和high_Mel

 

def hz2mel(hz):    return 2595 * np.log10(1 + hz / 700.)def mel2hz(mel):    return 700 * (10 ** (mel / 2595.0) - 1)f_low = 0f_high = 20000low_mel = hz2mel(f_low)high_mel = hz2mel(f_high)
 
  • 将low_Mel和high_Mel等距分成M等份

 

mel_points = np.linspace(low_mel, high_mel, n_filter + 2)
  • 然后将等分的Mel值反变换到频率,得到频率值

 

binf = np.floor((nfft+1)*mel2hz(mel_points)/sf)
  • 构建三角滤波器

Mel、Bark以及ERB介绍_java_03

 

  for indexj in range(0, nfilt):        left = binf[indexj]        center = binf[indexj + 1]        right = binf[indexj + 2]        for indexi in range(int(left), int(center)):            fbank[indexj, indexi] = (indexi - left) / ( center - left)        for indexi in range(int(center), int(right)):            fbank[indexj, indexi] = (right - indexi) / ( right -center)

绘制Mel滤波器的完整代码如下

 

import numpy as npimport matplotlib.pyplot as pltdef hz2mel(hz):    return 2595 * np.log10(1 + hz / 700.)def mel2hz(mel):    return 700 * (10 ** (mel / 2595.0) - 1)sf = 8000f_low = 20f_high = sf // 2low_mel = hz2mel(f_low)high_mel = hz2mel(f_high)n_filter = 40nfft = 512mel_points = np.linspace(low_mel, high_mel, n_filter + 2)binf = np.floor((nfft + 1) * mel2hz(mel_points) / sf)fbank = np.zeros([n_filter, nfft // 2 + 1])df = sf / nfftw2 = int(nfft / 2 + 1)freq = []for n in range(0, w2):    freqs = int(n * df)    freq.append(freqs)for indexj in range(0, n_filter):    left = binf[indexj]    center = binf[indexj + 1]    right = binf[indexj+2]    for indexi in range(int(binf[indexj]), int(binf[indexj+1])):        fbank[indexj, indexi] = (indexi - left) / ( center - left)    for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):        fbank[indexj, indexi] = ( right - indexi) / ( right - center)    plt.plot(freq, fbank[indexj,:])plt.xlabel('frequency')plt.ylabel('amplitude')plt.show()
 

Mel、Bark以及ERB介绍_java_04

 

Bark刻度

Bark刻度是另外一个用于描述人耳对频率感知的非线性。从频率转换到Bark的公式为

Mel、Bark以及ERB介绍_java_05

逆变换公式为

Mel、Bark以及ERB介绍_java_06

采用上述同样的步骤,设计一组基于Bark的三角滤波器,代码如下

 

import numpy as npimport matplotlib.pyplot as pltdef hz2bark(hz):    bark = 26.81 * hz / (1960 + hz) - 0.53    if bark < 2:        bark = bark + 0.15 * (2 - bark)    if bark > 20.1:        bark = bark + 0.22 * (bark - 20.1)    return barkdef bark2hz(bark):    if bark < 2:        bark = (bark - 0.3) / 0.85    if bark > 20.1:        bark = (bark + 4.422) / 1.22    hz = 1960 * ( (bark + 0.53) / (26.28 - bark))    return hzsf = 8000f_low = 20f_high = sf // 2low_bark = hz2bark(f_low)high_bark = hz2bark(f_high)n_filter = 24nfft = 512bark_points = np.linspace(low_bark, high_bark, n_filter + 2)binf = np.zeros(len(bark_points))for index in range(0, len(bark_points)):    binf[index] = np.floor( (nfft + 1) * bark2hz(bark_points[index]) / sf)fbank = np.zeros([n_filter, nfft // 2 + 1])df = sf / nfftw2 = int(nfft / 2 + 1)freq = []for n in range(0, w2):    freqs = int(n * df)    freq.append(freqs)for indexj in range(0, n_filter):    left = binf[indexj]    center = binf[indexj + 1]    right = binf[indexj+2]    for indexi in range(int(binf[indexj]), int(binf[indexj+1])):        fbank[indexj, indexi] = (indexi - left) / ( center - left)    for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):        fbank[indexj, indexi] = ( right - indexi) / ( right - center)    plt.plot(freq, fbank[indexj,:])plt.xlabel('frequency')plt.ylabel('amplitude')plt.show()
 

Mel、Bark以及ERB介绍_java_07

 

ERB刻度

 

另外一个描述频率感知非线性变换的关系为等效矩形带宽(equivalent rectangular bandwidth, ERB)。从频率转换成ERB刻度值的计算公式为

Mel、Bark以及ERB介绍_java_08

逆变换为

Mel、Bark以及ERB介绍_java_09

同样基于ERB刻度设计三角滤波器,其代码如下

 

import numpy as npimport matplotlib.pyplot as pltdef hz2erb(hz):    A = 1000 * np.log(10.) / ( 24.7 * 4.37)    erb = A * np.log10(1 + hz * 0.00437)    return erbdef erb2hz(erb):    A = 1000 * np.log(10.) / ( 24.7 * 4.37)    hz = (10 ** (erb / A) - 1) / 0.00437    return hzsf = 8000f_low = 20f_high = sf // 2low_erb = hz2erb(f_low)high_erb = hz2erb(f_high)n_filter = 24nfft = 512erb_points = np.linspace(low_erb, high_erb, n_filter + 2)binf = np.floor((nfft + 1) * erb2hz(erb_points) / sf)fbank = np.zeros([n_filter, nfft // 2 + 1])df = sf / nfftw2 = int(nfft / 2 + 1)freq = []for n in range(0, w2):    freqs = int(n * df)    freq.append(freqs)for indexj in range(0, n_filter):    left = binf[indexj]    center = binf[indexj + 1]    right = binf[indexj+2]    for indexi in range(int(binf[indexj]), int(binf[indexj+1])):        fbank[indexj, indexi] = (indexi - left) / ( center - left)    for indexi in range(int(binf[indexj+1]), int(binf[indexj + 2])):        fbank[indexj, indexi] = ( right - indexi) / ( right - center)    plt.plot(freq, fbank[indexj,:])plt.xlabel('frequency')plt.ylabel('amplitude')plt.show()
 

Mel、Bark以及ERB介绍_java_10

https://mp.weixin.qq.com/s/pGwO_27x8ddQF55wTSQlmA