之所以会相对系统地总结GPU DCVS,是因为前段时间遇到一个GPU频率一直无法调节的问题
1. 问题现象
从 /sys/class/kgsl/kgsl-3d0/clock-mhz
获取GPU频率,结果一直维持在624MHz,该芯片的最高频率
无论做场景切换还是轻重载都是624MHz
2. 初步分析
无论是CPU还是GPU一般出现频率固定无法调频的时候,可能性有
- governor设置成了userspace, performance这些非动态调节的governor
- 频率被锁了,一些场景识别出了问题,或者是锁频了没有成功退出
- 后台异常,负载一直过高,这种可能性比较小,因为一旦出现了就会很卡了
- 调频阈值过于激进,只要有负载就满频跑
- governor 自身bug
- 调频驱动bug
接下来就是排除法
(1) 确认governor
设备节点 /sys/class/kgsl/kgsl-3d0/devfreq/governor
, 结果是msm-adreno-tz
,是高通默认的gpu dcvs governor没有问题
(2) 确认是否锁频导致
- /sys/class/kgsl/kgsl-3d0/ 下相应的min, max 都没有设置
- 添加日志信息,确认governor
/drivers/devfreq/governor_msm_adreno_tz.c
以及kgsl_pwrkgsl.c
都没有锁最低频率的设置
而且governor出来的值,就是624MHz
(3) 确认负载
从 /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage
结果看,负载都很低0-10之间
看到这里,这个问题已经不是普通的频率无法调节问题,可能是平台设定维持最高频率或者是异常导致
这时在kgsl的接口目录/sys/class/kgsl/kgsl-3d0
中,看到一个名为default_pwrlevel
的节点
(4) 尝试 default_pwrlevel
xxx:/sys/class/kgsl/kgsl-3d0 # cat default_pwrlevel
1
xxx:/sys/class/kgsl/kgsl-3d0 # cat freq_table_mhz
624 624 560 510 401 315 214 133
default_pwrlevel 值是1,对应freq table中的频率是624MHz,同样高通平台其他的设备,default_pwrlevel 对应的是频率最低值
而修改default_pwrlevel 值为7之后,频率可调, 或者设置为6,频率可调,没有负载会恢复到level 6
3. 深入分析
这个问题到这里至少已经有方案了,调整default_pwrlevel
(1) default_pwrlevel
底层的 default_pwrlevel 配置在 dts中
kernel/drivers/gpu/msm/adreno.c
static void adreno_of_get_initial_pwrlevel(struct adreno_device *adreno_dev,
struct device_node *node)
{
struct kgsl_device *device = KGSL_DEVICE(adreno_dev);
struct kgsl_pwrctrl *pwr = &device->pwrctrl;
int init_level = 1;
of_property_read_u32(node, "qcom,initial-pwrlevel", &init_level);
if (init_level < 0 || init_level > pwr->num_pwrlevels)
init_level = 1;
pwr->active_pwrlevel = init_level;
pwr->default_pwrlevel = init_level;
}
arch/arm64/boot/dts/xpeng/xxx.dtsi
/* GPU overrides for auto */
&msm_gpu {
qcom,gpu-pwrlevel-bins {
qcom,gpu-pwrlevels-0 {
qcom,initial-pwrlevel = <1>;
};
qcom,gpu-pwrlevels-2 {
qcom,initial-pwrlevel = <2>;
};
};
};
default_pwrlevel 如何影响频率
从default_pwrlevel的调用栈看,kgsl_pwrctrl_enable
是最终apply default_pwrlevel的地方
static int kgsl_pwrctrl_enable(struct kgsl_device *device)
{
struct kgsl_pwrctrl *pwr = &device->pwrctrl;
int level, status;
if (pwr->wakeup_maxpwrlevel) {
level = pwr->max_pwrlevel;
pwr->wakeup_maxpwrlevel = 0;
} else if (kgsl_popp_check(device)) {
level = pwr->active_pwrlevel;
} else {
level = pwr->default_pwrlevel;
}
kgsl_pwrctrl_pwrlevel_change(device, level);
/* Order pwrrail/clk sequence based upon platform */
status = kgsl_pwrctrl_pwrrail(device, KGSL_PWRFLAGS_ON);
if (status)
return status;
kgsl_pwrctrl_clk(device, KGSL_PWRFLAGS_ON, KGSL_STATE_ACTIVE);
kgsl_pwrctrl_axi(device, KGSL_PWRFLAGS_ON);
dump_stack();
return device->ftbl->regulator_enable(device);
}
当GPU 进入slumber状态时, 调用kgsl_pwrctrl_enable, 代码会走到第12行,level值就是default level,这样下次重新唤醒工作的时候,频率就是default_pwrlevel对应的频率了
高的default_pwrlevel 理论上对性能更好,在GPU遇到高负载时,不需要一级级把频率升上去,而是直接最高频率,如果负载不高,后续也可以降下来
如何调用看dumpstack
<6>[ 42.925409] [<ffffff800808862c>] dump_backtrace+0x0/0x1c0
<6>[ 42.925420] [<ffffff8008088800>] show_stack+0x14/0x1c
<6>[ 42.925433] [<ffffff8008327cbc>] dump_stack+0x94/0xb4
<6>[ 42.925445] [<ffffff8008575538>] kgsl_pwrctrl_enable+0xbc/0xdc
<6>[ 42.925454] [<ffffff8008575ab4>] kgsl_pwrctrl_change_state+0x1e0/0x420
<6>[ 42.925465] [<ffffff800859e3fc>] adreno_start+0xe0/0x370
<6>[ 42.925473] [<ffffff800857576c>] _wake+0xb4/0x21c
<6>[ 42.925480] [<ffffff8008575924>] kgsl_pwrctrl_change_state+0x50/0x420
<6>[ 42.925488] [<ffffff8008575eac>] kgsl_active_count_get+0x8c/0x148
<6>[ 42.925500] [<ffffff8008585638>] sendcmd+0x98/0x498
<6>[ 42.925509] [<ffffff80085868dc>] _adreno_dispatcher_issuecmds.part.13+0x350/0x548
<6>[ 42.925517] [<ffffff8008587964>] adreno_dispatcher_work+0x5e8/0x7a4
<6>[ 42.925528] [<ffffff80080bb25c>] kthread_worker_fn+0xb0/0x12c
<6>[ 42.925536] [<ffffff80080bb1a4>] kthread+0xdc/0xe4
<6>[ 42.925545] [<ffffff8008082f50>] ret_from_fork+0x10/0x40
<3>[ 42.926576] ====== _wake 2636
<3>[ 43.010218] ====== kgsl_pwrctrl_enable 2531
<6>[ 43.010368] CPU: 0 PID: 364 Comm: kworker/u8:4 Not tainted 4.4.178-perf+ #57
<6>[ 43.010376] Hardware name: Qualcomm Technologies, Inc. APQ 8096pro V1.1 AUTO ADP (DT)
<6>[ 43.010396] Workqueue: kgsl-workqueue kgsl_idle_check
<6>[ 43.010406] Call trace:
<6>[ 43.010422] [<ffffff800808862c>] dump_backtrace+0x0/0x1c0
<6>[ 43.010432] [<ffffff8008088800>] show_stack+0x14/0x1c
<6>[ 43.010447] [<ffffff8008327cbc>] dump_stack+0x94/0xb4
<6>[ 43.010455] [<ffffff8008575538>] kgsl_pwrctrl_enable+0xbc/0xdc
<6>[ 43.010464] [<ffffff80085755fc>] _slumber+0xa4/0x160
<6>[ 43.010472] [<ffffff8008575b88>] kgsl_pwrctrl_change_state+0x2b4/0x420
<6>[ 43.010481] [<ffffff8008575d88>] kgsl_idle_check+0x94/0x12c
<6>[ 43.010491] [<ffffff80080b600c>] process_one_work+0x264/0x3fc
<6>[ 43.010499] [<ffffff80080b69fc>] worker_thread+0x310/0x424
<6>[ 43.010510] [<ffffff80080bb1a4>] kthread+0xdc/0xe4
<6>[ 43.010519] [<ffffff8008082f50>] ret_from_fork+0x10/0x40
(2) Root cause
调整default_pwrlevel 只是在表面上解决了问题,疑问是 default_pwrlevel 只是影响GPU从空载休眠状态到工作状态的初始频率,不应该存在default_pwrlevel 是频率时就可以调频,高频率不可以调频的场景,还需要继续查
从default_pwlevel设定来看,设置为0-1时候(即频率是624MHz时,这个平台默认最高的两个频率都是624MHz),频率就不可调,设置为2-7时,可以动态调节,所以应该是驱动什么地方对这个最高频率做了手脚导致
在msm-adreno-tz governor 的计算频率方法里加日志 tz_get_target_freq()
static int tz_get_target_freq(struct devfreq *devfreq, unsigned long *freq,
u32 *flag)
{
int result = 0;
struct devfreq_msm_adreno_tz_data *priv = devfreq->data;
struct devfreq_dev_status stats;
int val, level = 0;
unsigned int scm_data[4];
int context_count = 0;
/* keeps stats.private_data == NULL */
result = devfreq->profile->get_dev_status(devfreq->dev.parent, &stats);
if (result) {
pr_err(TAG "get_status failed %d\n", result);
return result;
}
*freq = stats.current_frequency;
priv->bin.total_time += stats.total_time;
priv->bin.busy_time += stats.busy_time;
if (stats.private_data)
context_count = *((int *)stats.private_data);
/* Update the GPU load statistics */
compute_work_load(&stats, priv, devfreq);
/*
* Do not waste CPU cycles running this algorithm if
* the GPU just started, or if less than FLOOR time
* has passed since the last run or the gpu hasn't been
* busier than MIN_BUSY.
*/
if ((stats.total_time == 0) ||
(priv->bin.total_time < FLOOR) ||
(unsigned int) priv->bin.busy_time < MIN_BUSY) {
return 0;
}
level = devfreq_get_freq_level(devfreq, stats.current_frequency);
if (level < 0) {
pr_err(TAG "bad freq %ld\n", stats.current_frequency);
return level;
}
/* Fix GPUFreq can not be scaled from 624MHz, for 820A, by xiaopeng */
if (level == 0 && devfreq->profile->freq_table[0] ==
devfreq->profile->freq_table[1]) {
level = 1;
}
/*
* If there is an extended block of busy processing,
* increase frequency. Otherwise run the normal algorithm.
*/
if (!priv->disable_busy_time_burst &&
priv->bin.busy_time > CEILING) {
val = -1 * level;
} else {
scm_data[0] = level;
scm_data[1] = priv->bin.total_time;
scm_data[2] = priv->bin.busy_time;
scm_data[3] = context_count;
__secure_tz_update_entry3(scm_data, sizeof(scm_data),
&val, sizeof(val), priv);
}
priv->bin.total_time = 0;
priv->bin.busy_time = 0;
/*
* If the decision is to move to a different level, make sure the GPU
* frequency changes.
*/
if (val) {
level += val;
level = max(level, 0);
level = min_t(int, level, devfreq->profile->max_state - 1);
}
*freq = devfreq->profile->freq_table[level];
return 0;
}
这个函数首先获取当前的gpufreq, 以及busy_time 等信息,然后把gpufreq查表转成pwrlevel,再通过tz来计算目标pwrlevel,是要在当前的level上加1还是减1,进而设定目标频率
通过加log就发现问题原因了
- 第39行,频率是624MHz的时候,这里的level从查表得到的值就是0 ({624, 624, 560, 510, 401, 315, 214, 133 })
- 后面就计算目标频率是加还是减
val
值,发现val
值是1, 也就是level 要加1了,频率降了一个挡位 - 那目标频率的level就是
0 + 1
= 1, 这个1对应的频率是多少呢,624MHz - 死循环,下次进入
tz_get_target_freq
, 又把624MHz算成了level 0, 结果目标level就在0-1之间死循环,频率都是624
4. Solution
问题是由于 level 0 - 1 两个档位对应的频率相同,进而governor的目标频率陷入死循环导致。
处理方法有下面几种,都可以达到目的,第三种看起来更健壮一点,如果只为解决问题,第一第二种都可以
- 在dts中去掉重复的频率
- 在
tz_get_target_freq
的freq转level时加处理,如果是0-1
对应的频率一样,就忽略掉0,level 直接等于1 - 在记录cur_freq时,也记录下cur_level,避免把level 1的624 下次算成level 0