This post is for driver or kernel developers/enthusiasts who have encountered a Blue Screen of Death on Windows where the bugcheck code is 0x9F, DRIVER_POWER_STATE_FAILURE, and parameter 1 is 0x3. There are a few variants on DRIVER_POWER_STATE_FAILURE, but this one is when a device object has been blocking an IRP for too long a time.
If you’re not familiar with IRPs, you should probably back up and learn the basics. Check out Power IRPs for individual devices. I’m not going to explain them here, but when a device connected to the system is making a power transition, it is responsible for completing that power transition within 2 minutes.
So, if your computer is trying to go to sleep and you have a webcam connected to it that should go to sleep along with the system, the driver for that webcam needs to complete the power IRP within two minutes. If it doesn’t, a watchdog timer fires and the system bugchecks with code 9F, parameter 3. Two minutes should be ample time for any device to complete a power IRP; if it takes more than that, it usually means there is a deadlock somewhere in the system.
You will need:
1. A kernel or full dump
9Fs are different than other crashes in that it’s not a single faulting line of source code. A wonderful debugging mentor of mine describes solving 9Fs as “untangling a chain of dependencies”. You need a full view of the system in order to confidently make a diagnosis here, in most cases. A minidump often won’t provide enough information.
2. WinDBG or KD set up to debug a crash dump
I know that my readers have a wide range of technical skills - if you don’t know how to use at least the basics WinDBG or KD to kernel debug, you may want to start there. See Getting Started with Windows Debugging. Once you have WinDBG set up and your crash dump opened, be sure you have somewhere to take notes. Personally, I’m crazy about WinDBG’s Scratch Pad functionality. Here’s a tiny picture of the WinDBG workspace I use for crash dumps (bigger picture here) - see how I have my notes right there in the same window on the top left. You’ll have to figure out whatever works for you if you haven’t already, though.
Basic steps to solve a 9F 3
9Fs range from extremely easy problems that you can solve in 2 minutes to extremely difficult that several experienced engineers will scratch their heads at for days. We’re going to stick with an easier one. The steps I describe here are the foundation of solving any 9F 3.
1. Start with .bugcheck to find out the stuck power IRP
We need to know which power IRP is stuck. This will be in bugcheck parameter 4.
Cool, our power IRP is ffffe00018d4aa50
. Write this down in your notes so you can easily copy/paste it later.
2. Look at IRPs and IRP worker threads with !irp and !poaction 2
First, we want to check out the IRP worker threads that are in the 2nd section of !poaction
. !poaction
is useful for any crash involving a power transition, such as bugcheck 0x9F 3. You can experiment with looking at the full output of the command on your own.
We want to see if our IRP is being serviced by one of the IRP worker threads.
Look at that! We see our IRP is a Set/D3 IRP and that it’s being serviced by that worker thread ffffe0000ae0e740
. Write that thread down.
If our stuck IRP isn’t here, don’t despair - see what’s going on in IRP worker threads using the techniques below that are running to see if you can get a better picture of what’s going on in the system. At this point, you can look at the IRP itself with !irp
to get some ideas about where it’s stuck.
Output truncated to the interesting bits for brevity:
Hmm, it’s pending in some driver called A38CCID. Seems suspicious. We’ll get back to that.
3. Look at the IRP worker thread with !thread or .thread
We saw our stuck IRP is being serviced in that IRP worker thread. Let’s check it out! You can see detailed information about the thread with !thread ffffe0000ae0e740
or you can switch to the thread context with .thread ffffe0000ae0e740
. For space/readability, I’ll do the latter.
You can see immediately what’s happening here. WDF is passing down a power IRP to our device, whose driver is a38ccid.sys. It never completes the power IRP, so the machine falls over when its two minutes are up. We don’t have the symbols for that driver since it’s not part of Windows, but we can look at it anyway:
If you own the driver that shows up here, you can take some extra steps to see what your driver is doing wrong that’s causing it to hold that power IRP for 2 minutes before completing it. If you don’t, there’s not much you can do to solve it, but you at least know who the culprit is. You can look up the name of the faulting driver online is and report a bug to their engineering team. In this case, a38ccid.sys is a smart card reader driver.
4. Other things to try: !stacks 2
Once you suspect a driver is at fault, you should see which thread stacks it shows up in. Sometimes you’ll see smoking guns. For example, maybe your driver that you identified in step 3 is calling WdfSpinlockAcquire
in another thread, which may indicate that they have a textbook deadlock somewhere.
Try it out on your own dump! Output truncated to the interesting parts.
Our faulty driver is in two different stacks right now and each of them is waiting on something. There’s a KeWaitForMultipleObjects
in the first stack and a KeWaitForSingleObject
in the second stack, which we recognize to also be the IRP worker thread. It’s impossible to tell for sure without the symbols, but it looks like they have a bug in their synchronization. These threads might waiting on each other, which would be a deadlock. If not, they’re waiting on something else that is never happening. Either way, we can say with a high degree of confidence that a83ccid.sys is the culprit here.
5. Other things to try: !running -it
Sometimes your stuck IRP is the victim of activity on some other processor. You can view what’s running on all processors to see if anything looks suspicious. Use !running -it
.
Note: This particular case isn’t a good candidate for !running -it
, so I’ve omitted the output. I’d like to give a different example that makes use of !running -it
in a different post.
Solving the puzzle
If you follow those steps, you will often be able to tell if your system exhibits the symptoms of a 3rd party (non-Windows) device driver blocking the power IRP from being completed. There are a multitude of other causes for DRIVER_POWER_STATE_FAILURE with a stuck IRP, but a faulty 3rd party driver is among the most common ones.
To me, there are few things more satisfying than solving a nice bugcheck 0x9F 3. I speculate that it’s weird to have a favorite BSOD crash to debug, but I find it very methodical and engaging with a fascinating view into the system, to boot. I’m a sucker for puzzles, though. Shoot me an email if you have any for me.
Happy debugging!
Written on June 10, 2016