在有关MFCC的许多文章中​

其中有关于Spectral Envelope(包络)的理解我一直有一些疑问。

疑问

为什么有如下假定?


Spectrum = Spectral Envelope * Spectral Details


有关包络Spectral Envelope的疑问_频域

正因为这个公式,才有后续的处理。但是对于这个我是挺好奇的。

对此我做了一些简单的实验。

实验

Step1. 找一个单一声源的音频

有关包络Spectral Envelope的疑问_数据_02

Step2. 对此音频做DFT得到频域数据

Step3. 对频域数据再次做DFT

如果存在Spectral Envelope和Spectral Details,那么必然会在这个结果中有体现。

伪代码如下:

for (int i = 0; i < N; ++i) {N
double realVal = m1[i][0]/N;
double imagVal = m1[i][1]/N;
double powVal = 2* (realVal*realVal +imagVal*imagVal);
double absVal = sqrt(powVal/2)*2;
// 仅打印能量大于1.25
if (absVal>1.25) {
fprintf(stdout, "%10i (%10.4lf %10.4lf) %10.4lf %10.4lf\n", i,
realVal, imagVal, absVal, powVal);
}
}

因为样例音频的能量较小,所以pow和abs值都偏小,这里根据1.25为阈值过滤。

打印Step3数据如下 (举了某一帧为例)

Frequency  (Real       Imag)        Abs       Power
1605 ( 0.5469 0.5966) 1.6187 1.3101
1607 ( -0.3830 -0.6633) 1.5319 1.1734
1608 ( -0.8168 -0.8465) 2.3527 2.7676
1609 ( -0.6892 -0.2346) 1.4560 1.0600
1610 ( 0.3351 0.8297) 1.7896 1.6013
1611 ( 1.0922 0.9707) 2.9224 4.2701
1614 ( -0.6581 -0.6849) 1.8997 1.8045
1616 ( 0.2837 0.6177) 1.3595 0.9242
1617 ( 0.6710 0.4794) 1.6494 1.3602
1620 ( -0.4764 -0.4920) 1.3697 0.9381
1622 ( 0.7372 0.9301) 2.3736 2.8170
1623 ( 0.8836 0.5938) 2.1291 2.2666
1625 ( -0.8374 -1.0777) 2.7296 3.7254
1626 ( -1.1240 -0.8214) 2.7843 3.8762
1628 ( 0.8786 1.1128) 2.8357 4.0205
1629 ( 0.9656 0.6244) 2.2998 2.6446
1631 ( -0.5870 -0.7584) 1.9180 1.8394
1632 ( -0.7730 -0.5451) 1.8917 1.7893
1634 ( 0.6053 0.6120) 1.7215 1.4818
1637 ( -0.6775 -0.7938) 2.0872 2.1782
1638 ( -0.6324 -0.1233) 1.2886 0.8303
1639 ( 0.4665 0.8667) 1.9684 1.9374
1640 ( 1.0342 0.9270) 2.7777 3.8579
1642 ( -0.4208 -0.8041) 1.8152 1.6474
1643 ( -1.0795 -1.0763) 3.0488 4.6476
1644 ( -0.7512 -0.2024) 1.5560 1.2106
1645 ( 0.4262 0.8774) 1.9509 1.9030
1646 ( 0.9605 0.8118) 2.5152 3.1630
1649 ( -0.7451 -0.6711) 2.0055 2.0111
1651 ( 0.4292 0.5677) 1.4235 1.0131
1654 ( -0.4299 -0.5846) 1.4513 1.0531
1656 ( 0.3248 0.7072) 1.5564 1.2112
1657 ( 0.8936 0.8522) 2.4697 3.0496
1658 ( 0.6013 0.2079) 1.2724 0.8095
1660 ( -0.9155 -1.0261) 2.7503 3.7821
1661 ( -0.8391 -0.4314) 1.8870 1.7803
1663 ( 0.7710 0.7510) 2.1526 2.3169
1664 ( 0.6116 0.3559) 1.4153 1.0015
1666 ( -0.6445 -0.7511) 1.9795 1.9592
1671 ( -0.4366 -0.7228) 1.6888 1.4261
1672 ( -0.8308 -0.5387) 1.9803 1.9609
1674 ( 0.8466 0.9201) 2.5007 3.1268
1675 ( 0.7521 0.3795) 1.6849 1.4195
1677 ( -0.8065 -0.9729) 2.5274 3.1938
1678 ( -0.8567 -0.4725) 1.9567 1.9143
1679 ( 0.1235 0.6154) 1.2554 0.7880
1680 ( 0.7764 0.7284) 2.1292 2.2667
1681 ( 0.5659 0.3056) 1.2863 0.8272
1683 ( -0.6042 -0.7033) 1.8545 1.7196
1685 ( 0.3171 0.5617) 1.2900 0.8321
1686 ( 0.6701 0.5375) 1.7181 1.4759
1689 ( -0.7254 -0.5887) 1.8685 1.7456
1691 ( 0.7279 0.9073) 2.3265 2.7062
1692 ( 0.8275 0.5309) 1.9662 1.9330
1694 ( -0.6795 -0.9210) 2.2890 2.6198
1695 ( -0.9242 -0.6791) 2.2937 2.6306
1697 ( 0.7296 0.7948) 2.1578 2.3280
1698 ( 0.6813 0.4235) 1.6044 1.2870
1700 ( -0.5638 -0.7306) 1.8457 1.7033
1701 ( -0.6597 -0.3087) 1.4568 1.0611
1703 ( 0.5065 0.4439) 1.3470 0.9072
1706 ( -0.7258 -0.6583) 1.9597 1.9203
1708 ( 0.4901 0.7264) 1.7525 1.5356
1709 ( 0.7369 0.5527) 1.8423 1.6971
1711 ( -0.4683 -0.7648) 1.7936 1.6085
1712 ( -0.8818 -0.7439) 2.3074 2.6620
1714 ( 0.6359 0.7758) 2.0063 2.0126
1715 ( 0.7287 0.5066) 1.7751 1.5755
1717 ( -0.4906 -0.7183) 1.7397 1.5134
1718 ( -0.6872 -0.3352) 1.5291 1.1691
1720 ( 0.6665 0.6527) 1.8656 1.7403
1723 ( -0.6042 -0.6591) 1.7882 1.5988
1725 ( 0.3617 0.6106) 1.4193 1.0072
1726 ( 0.6514 0.5098) 1.6543 1.3684
1728 ( -0.3398 -0.6308) 1.4331 1.0269
1729 ( -0.8178 -0.7883) 2.2718 2.5805
1731 ( 0.5277 0.7073) 1.7648 1.5573
1732 ( 0.7133 0.5763) 1.8341 1.6819
1734 ( -0.3499 -0.6897) 1.5467 1.1962
1735 ( -0.8273 -0.6361) 2.0872 2.1782
1737 ( 0.4145 0.5149) 1.3220 0.8738
1740 ( -0.5147 -0.6539) 1.6643 1.3849
1742 ( 0.2933 0.6117) 1.3568 0.9204
1743 ( 0.7094 0.5838) 1.8374 1.6879
1745 ( -0.3245 -0.5805) 1.3300 0.8845
1746 ( -0.7712 -0.7880) 2.2051 2.4313
1748 ( 0.4123 0.6631) 1.5617 1.2194
1749 ( 0.7214 0.6162) 1.8976 1.8004
1752 ( -0.7097 -0.6012) 1.8603 1.7303
1754 ( 0.4279 0.6011) 1.4757 1.0888
1755 ( 0.5290 0.4104) 1.3390 0.8965
1757 ( -0.3994 -0.6132) 1.4636 1.0711
1758 ( -0.5697 -0.2795) 1.2691 0.8053
1760 ( 0.6547 0.6175) 1.7999 1.6198
1763 ( -0.7711 -0.8903) 2.3557 2.7747
1764 ( -0.6615 -0.1941) 1.3787 0.9504
1765 ( 0.2597 0.5811) 1.2731 0.8104
1766 ( 0.7059 0.6787) 1.9586 1.9181
1769 ( -0.7353 -0.6805) 2.0038 2.0076
1774 ( -0.3456 -0.5799) 1.3501 0.9114
1775 ( -0.5955 -0.3409) 1.3724 0.9418
1777 ( 0.6860 0.7067) 1.9697 1.9399
1778 ( 0.6015 0.3305) 1.3726 0.9420
1780 ( -0.7709 -0.9791) 2.4923 3.1058
1781 ( -0.8193 -0.3706) 1.7984 1.6171
1783 ( 0.7863 0.7897) 2.2287 2.4836
1784 ( 0.6386 0.3798) 1.4859 1.1040
1786 ( -0.6527 -0.6889) 1.8979 1.8011
1794 ( 0.6092 0.6963) 1.8502 1.7117
1795 ( 0.6403 0.4078) 1.5183 1.1527
1797 ( -0.6073 -0.9037) 2.1776 2.3709
1798 ( -0.8947 -0.5811) 2.1336 2.2761
1800 ( 0.6402 0.7746) 2.0097 2.0195
1801 ( 0.7210 0.5212) 1.7793 1.5829
1803 ( -0.5467 -0.6950) 1.7685 1.5638
1804 ( -0.6232 -0.3690) 1.4485 1.0490
1809 ( -0.6457 -0.5318) 1.6731 1.3996
1811 ( 0.4876 0.6466) 1.6196 1.3115
1812 ( 0.6688 0.4906) 1.6589 1.3759
1814 ( -0.4376 -0.7728) 1.7762 1.5774
1815 ( -0.8517 -0.6448) 2.1364 2.2822
1817 ( 0.4681 0.6593) 1.6172 1.3076
1818 ( 0.6824 0.5767) 1.7870 1.5966
1820 ( -0.3808 -0.5756) 1.3804 0.9527
1821 ( -0.5773 -0.3722) 1.3739 0.9438
1826 ( -0.5235 -0.5277) 1.4866 1.1050
1828 ( 0.3961 0.5850) 1.4130 0.9982
1829 ( 0.6447 0.5126) 1.6473 1.3568
1831 ( -0.3013 -0.6583) 1.4480 1.0484
1832 ( -0.8076 -0.6817) 2.1138 2.2341
1834 ( 0.3556 0.5394) 1.2922 0.8349
1835 ( 0.5755 0.5113) 1.5396 1.1851
1838 ( -0.5621 -0.4479) 1.4374 1.0330
1843 ( -0.5577 -0.6293) 1.6817 1.4140
1846 ( 0.6783 0.5931) 1.8021 1.6238
1849 ( -0.7589 -0.7348) 2.1127 2.2317
1852 ( 0.5611 0.4844) 1.4826 1.0990
1855 ( -0.5268 -0.4049) 1.3287 0.8828
1857 ( 0.4479 0.5029) 1.3469 0.9071
1860 ( -0.4040 -0.5545) 1.3723 0.9415
1863 ( 0.6084 0.5940) 1.7006 1.4461
1866 ( -0.7135 -0.7572) 2.0808 2.1648
1869 ( 0.6235 0.5931) 1.7211 1.4810
1872 ( -0.6399 -0.5927) 1.7444 1.5214
1877 ( -0.3922 -0.5688) 1.3818 0.9547
1880 ( 0.5493 0.5693) 1.5822 1.2517
1883 ( -0.6106 -0.7427) 1.9230 1.8490
1884 ( -0.6148 -0.3051) 1.3727 0.9422
1886 ( 0.5917 0.6357) 1.7368 1.5083
1887 ( 0.5719 0.3247) 1.3153 0.8650
1889 ( -0.5916 -0.6428) 1.7472 1.5264
1892 ( 0.5014 0.4698) 1.3743 0.9443
1895 ( -0.5455 -0.3918) 1.3432 0.9021
1897 ( 0.4741 0.5314) 1.4243 1.0144
1900 ( -0.4978 -0.6858) 1.6949 1.4364
1901 ( -0.6572 -0.4270) 1.5675 1.2285
1903 ( 0.4999 0.6011) 1.5636 1.2224
1904 ( 0.6008 0.4289) 1.4763 1.0898
1906 ( -0.4765 -0.6516) 1.6144 1.3032
1907 ( -0.5718 -0.3313) 1.3216 0.8733
1912 ( -0.5328 -0.4542) 1.4002 0.9803
1914 ( 0.4262 0.5720) 1.4267 1.0177
1915 ( 0.5523 0.4458) 1.4195 1.0075
1917 ( -0.4583 -0.7025) 1.6776 1.4071
1918 ( -0.7217 -0.5466) 1.8107 1.6393
1920 ( 0.4318 0.5975) 1.4744 1.0869
1921 ( 0.6397 0.4978) 1.6212 1.3142
1932 ( 0.4983 0.4477) 1.3398 0.8975
1934 ( -0.3503 -0.6434) 1.4650 1.0732
1935 ( -0.7401 -0.6257) 1.9382 1.8784
1937 ( 0.3682 0.5738) 1.3636 0.9297
1938 ( 0.6561 0.5568) 1.7210 1.4809
1941 ( -0.5630 -0.4198) 1.4046 0.9864
1949 ( 0.5201 0.4733) 1.4065 0.9892
1951 ( -0.2489 -0.5846) 1.2707 0.8074
1952 ( -0.7649 -0.7377) 2.1253 2.2584
1955 ( 0.6854 0.6165) 1.8438 1.6997
1958 ( -0.5426 -0.4192) 1.3713 0.9403
1966 ( 0.5802 0.5520) 1.6017 1.2826
1969 ( -0.6955 -0.7515) 2.0479 2.0969
1972 ( 0.6026 0.6063) 1.7097 1.4615
1975 ( -0.5296 -0.4935) 1.4477 1.0479
1980 ( -0.4691 -0.5739) 1.4825 1.0989
1983 ( 0.5817 0.6276) 1.7116 1.4647
1984 ( 0.5541 0.3526) 1.3135 0.8627
1986 ( -0.6485 -0.7723) 2.0169 2.0338
1987 ( -0.6461 -0.3541) 1.4735 1.0856
1989 ( 0.5044 0.5461) 1.4868 1.1054
1992 ( -0.4451 -0.4639) 1.2859 0.8267
1998 ( -0.5347 -0.3992) 1.3346 0.8906
2000 ( 0.5005 0.6019) 1.5656 1.2255
2001 ( 0.5902 0.4642) 1.5018 1.1276
2003 ( -0.5160 -0.7321) 1.7913 1.6043
2004 ( -0.6948 -0.4559) 1.6621 1.3813
2006 ( 0.4467 0.5232) 1.3758 0.9464
2007 ( 0.5014 0.3821) 1.2608 0.7948
2009 ( -0.3997 -0.4930) 1.2694 0.8057
2015 ( -0.5653 -0.5043) 1.5151 1.1478
2018 ( 0.5651 0.5139) 1.5276 1.1668
2020 ( -0.3460 -0.6168) 1.4144 1.0002
2021 ( -0.6781 -0.5294) 1.7206 1.4802
2023 ( 0.3921 0.5192) 1.3013 0.8467
2024 ( 0.5370 0.4306) 1.3767 0.9476
2035 ( 0.4505 0.4546) 1.2799 0.8191
2038 ( -0.6117 -0.5310) 1.6200 1.3123
2041 ( 0.5376 0.4939) 1.4600 1.0658
2044 ( -0.5229 -0.3653) 1.2757 0.8137

可以发现能量主要集中在“高频”部分。大部分有声音的帧基本都是如此。

看上去有点像“高频”部分是spectral details, “低频”部分就是spectral envolope。