我把perf放在/data/bin下。

adb shell /data/bin/perf list将列出所有的performance events,分成四类:hardware event,software event, hardware cache event和tracepoint event.
adb shell /data/bin/perf stat ls会列出ls命令执行过程中各个performance counter的统计。
adb shell /data/bin/perf -e event ls将会输出针对事件event的统计

执行adb shell /data/bin/perf stat ls,会发现如下的输出:

 Error: open_counter returned with 19 (No such device). /bin/dmesg may provide additional information.
 Fatal: Not all events could be opened.

按提示执行adb shell dmesg,发现这个出错信息:

  hw perfevents: unable to reserve pmu

对每个event逐个执行adb shell /data/bin/perf -e event ls,会发现,只要event是hardware event 或hardware cache event,就会出上面提到的错误,出错信息是一样的。而event是software event或者tracepoint event时,则成功。这意味着什么呢?意味着PMU硬件没有起作用,所有的hardware performance counter都没法统计。

Galaxy Nexus的CPU是OMAP (arm cortex A9),以前已经把对应的kernel源代码下载到了omap目录
git clone https://android.googlesource.com/kernel/omap.git
cd omap
git checkout remotes/origin/android-omap-tuna-3.0 -b tuna

用上面的出错信息去搜索引擎检索,会发现很多有关omap的perf stat的出错的讨论,有人说,是这款cpu芯片设计有问题,导致没法发生中断。是不是硬件有问题,可以用ARM提供的gator来检验一下。gator-driver中,针对不同的内核版本,提供了不同的profiling方式。如果版本低于3.0.0,则用arm自己提供的PMU操作,否则,采用linux的perf体系。目前,该手机的内核版本是3.0.31,将会采用linux的perf,用DS-5的streamliner做实验,确实可以从dmesg输出中看到

 hw perfevents: unable to reserve pmu

如果修改gator-driver对内核版本的判断,使其在版本高于3.1.0时才用perf体系,那么,在这款手机上,gator模块会用自己的pmu操作取counter数据,而不是依赖linux内核所带设备驱动。实验结果是,dmesg中的“hw perfevents: unable to reserve pmu" 消失了,hardware performance counter的值被读回来了。这证明硬件是没有问题的,应该把注意力放在内核代码上。

在内核代码中检索"unable to reserve pmu",可以发现,它只出现在omap/arch/arm/kernel/perf_event.c的armpmu_reserve_hardware()函数中,
当reserve_pmu(ARM_PMU_DEVICE_CPU)返回错误码时,就会输出这个警告信息。
391 static int
392 armpmu_reserve_hardware(void)
393 {
394     struct arm_pmu_platdata *plat;
395     irq_handler_t handle_irq;
396     int i, err = -ENODEV, irq;
397
398     pmu_device = reserve_pmu(ARM_PMU_DEVICE_CPU);
399     if (IS_ERR(pmu_device)) {
400         pr_warning("unable to reserve pmu\n");
401         return PTR_ERR(pmu_device);
402     }
403     ……

reserver_pmu(enum arm_pmu_type device)在omap/arch/arm/kernel/pmu.c中:

 61 struct platform_device *
 62 reserve_pmu(enum arm_pmu_type device)
 63 {
 64     struct platform_device *pdev;
 65
 66     if (test_and_set_bit_lock(device, &pmu_lock)) {
 67         pdev = ERR_PTR(-EBUSY);
 68     } else if (pmu_devices[device] == NULL) {
 69         clear_bit_unlock(device, &pmu_lock);
 70         pdev = ERR_PTR(-ENODEV);
 71     } else {
 72         pdev = pmu_devices[device];
 73     }
 74
 75     return pdev;
 76 }

从中可以看到,找不到设备的情况下,返回ENODEV,正好和perf stat ls的出错信息吻合。

在omap/arch/arm/mach-omap2/devices.c中,对pmu设备进行初始化注册工作:

 592 static void omap_init_pmu(void)
 593 {
 594     if (cpu_is_omap24xx())
 595         omap_pmu_device.resource = &omap2_pmu_resource;
 596     else if (cpu_is_omap34xx())
 597         omap_pmu_device.resource = &omap3_pmu_resource;
 598     else
 599         return;
 600
 601     platform_device_register(&omap_pmu_device);
 602 }

从dmesg的输出中,可以发现Galaxy Nexus的CPU型号是OMAP 4460.对照源码,当cpu为4460时,根本就没有分配resource,也没有进行设备注册。 omap24xx和omap34xx的resource定义如下

 568 static struct resource omap2_pmu_resource = {
 569     .start  = 3,
 570     .end    = 3,
 571     .flags  = IORESOURCE_IRQ,
 572 };
 573
 574 static struct resource omap3_pmu_resource = {
 575     .start  = INT_34XX_BENCH_MPU_EMUL,
 576     .end    = INT_34XX_BENCH_MPU_EMUL,
 577     .flags  = IORESOURCE_IRQ,
 578 };

可以看出,这个resource是中断号。那么omap 4460的PMU的中断号是多少呢?omap 4460有两个核(Cortex-A9 MPCore),每个核都有自己的PMU,每个PMU都有一个中断号,所以,应该有两个中断号。从网上搜索OMAP 4460 和PMU的结果是,这两个中断号为:

 54 + OMAP44XX_IRQ_GIC_START
 55 + OMAP44XX_IRQ_GIC_START

在omap/arch/arm/mach-omap2/omap_hwmod_44xx_data.c中:

  #define OMAP44XX_IRQ_GIC_START  32

所以,这两个中断号就是86,87 于是,修改后的omap/arch/arm/mach-omap/devices.c如下:

 568 static struct resource omap2_pmu_resource = {
 569     .start  = 3,
 570     .end    = 3,
 571     .flags  = IORESOURCE_IRQ,
 572 };
 573
 574 static struct resource omap3_pmu_resource = {
 575     .start  = INT_34XX_BENCH_MPU_EMUL,
 576     .end    = INT_34XX_BENCH_MPU_EMUL,
 577     .flags  = IORESOURCE_IRQ,
 578 };
 579
 580 static struct resource omap446x_pmu_resource = {
 581     .start  = 86,
 582     .end    = 87,
 583     .flags  = IORESOURCE_IRQ,
 584 };
 585
 586 static struct platform_device omap_pmu_device = {
 587     .name       = "arm-pmu",
 588     .id     = ARM_PMU_DEVICE_CPU,
 589     .num_resources  = 1,
 590 };
 591
 592 static void omap_init_pmu(void)
 593 {
 594     if (cpu_is_omap24xx())
 595         omap_pmu_device.resource = &omap2_pmu_resource;
 596     else if (cpu_is_omap34xx())
 597         omap_pmu_device.resource = &omap3_pmu_resource;
 598     else if (cpu_is_omap446x())
 599         omap_pmu_device.resource = &omap446x_pmu_resource;
 600     else
 601         return;
 602
 603     platform_device_register(&omap_pmu_device);
 604 }

或者看git diff的输出

diff --git a/arch/arm/mach-omap2/devices.c b/arch/arm/mach-omap2/devices.c
index cf7a0ba..fce5cbc 100644
--- a/arch/arm/mach-omap2/devices.c
+++ b/arch/arm/mach-omap2/devices.c
@@ -577,6 +577,12 @@ static struct resource omap3_pmu_resource = {
        .flags  = IORESOURCE_IRQ,
 };

+static struct resource omap446x_pmu_resource = {
+       .start  = 86,
+       .end    = 87,
+       .flags  = IORESOURCE_IRQ,
+};
+
 static struct platform_device omap_pmu_device = {
        .name           = "arm-pmu",
        .id             = ARM_PMU_DEVICE_CPU,
@@ -589,6 +595,8 @@ static void omap_init_pmu(void)
                omap_pmu_device.resource = &omap2_pmu_resource;
        else if (cpu_is_omap34xx())
                omap_pmu_device.resource = &omap3_pmu_resource;
+       else if (cpu_is_omap446x())
+               omap_pmu_device.resource = &omap446x_pmu_resource;
        else
                return; 

重新编译好,烧制进手机,执行perf stat ls

hzh@fangtian:~/android/omap$ adb shell /data/bin/perf stat ls /sdcard
/sdcard

 Performance counter stats for 'ls /sdcard':

         9.735107 task-clock                #    0.761 CPUs utilized
                7 context-switches          #    0.001 M/sec
                0 CPU-migrations            #    0.000 M/sec
              127 page-faults               #    0.013 M/sec
          3351924 cycles                    #    0.344 GHz
                0 stalled-cycles-frontend   #    0.00% frontend cycles idle
                0 stalled-cycles-backend    #    0.00% backend  cycles idle
                0 instructions              #    0.00  insns per cycle
                0 branches                  #    0.000 M/sec
                0 branch-misses             #    0.00% of all branches

      0.012786865 seconds time elapsed