Debug me --- if I bugged you enough!

No it is not what you think I'm saying! It is your software packages is saying, right?

So you designed, developed, and debugged your software package. It consists of several user and some kernel components. You have your unittest component you built while you were coding your design, you tested...  

It works. You feel proud of yourself! And you should be proude of yourself! Enjoy your spirit...

Then comes the testing from the testing people, including a majority of harness/stress/perf testings. And for all of these, unless yours is a big software house, prefabricated tests suits are used. 

One particular scenario is the storage stack under windows. What happens if you have 3 or 4 components in the kernel that are directly adding values to the storage stack? Oh, that surely scare the hell out of me. It's like plugging a pacemaker into an otherwise healthy heart!!! Of course, there are other examples where - a thorough third party off the shelf test suits are required to be run on test machines.

 

For harness tests, on storage stack I learned to run SysMark, PCMark etc. Well, these are to test an all round different scenarios an user might end up with when they play with their PCs!

The problem comes, when these softwares are not that solid! What to expect when such a preview package is for free?

So one alternative is to run the tests under kernel debugger configured, meaning, the system boots up to the boot configuration where the debugger is on. Exception or other test application faults happens, you take down the symptom, and take the test part out of the test suits, if you can. But some of the exceptions could be handled by the system, if debugger is not hooked up.

Under this last condition, when the test suit runs fine without debugger enabled, is particularly a good one. Now you can not configure debugger, so what you do?

Well, here comes the DbgView from sysinternals ( now Microsoft tools).

What should I do now?

 

-- Well the tests runs for hours, it boots and reboots during the tests.

-- I need to have certain informations being printed to logs from the kernel component we developed.

These two are the basic goal. How do we solve it?

We can develop the components for debugging. Yeah, some will get scared to hear that "Develop for debugging". So for DbgPrintEx, you can use your own masks based on your components' features. If they are enough modular, you can have module specific masks.

Now you will have to enable the registry entries for the test systems. Here is what you need ( read online for the detail about what you need to create in the registry hive )

 

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Debug Print Filter]
"DPFLTR_IHVDRIVER_ID"=dword:0000000f

 

 

Now you need a batch file, that would start at boot time. You store this batch file and dbgview.exe in c:\. Then you make a short cut of this batch file. Then from Windows->Start->All Programs->Startup, right-click then open then cut and paste the short cut. So first you copy the short cut then windows->Start->

@echo off

rem -- launch dbgview at startup

start c:\dbgview.exe /t /f /l c:\dbgviewLog /g /k /m 10 /p /w 

 

First time it boots, it will ask for accepting license args. Next time onward, it would automatically start at boot time.

Here is some sample result from the logfile (dbgviewLog)

[\\VANHALEN-D4]

00000001 0.00000000 KTM:  TmRollbackTransaction for tx 61859f0

00000002 0.01082284 KTM:  TmRollbackTransaction for tx 61859f0

00000003 0.02614005 KTM:  TmRollbackTransaction for tx 61859f0

00000004 0.03681052 KTM:  TmRollbackTransaction for tx 61859f0

00000005 41.99409103 TpmEvtEntropyTimer().

00000006 118.64528656 KTM:  TmRollbackTransaction for tx 5ae0c80

00000007 118.65682220 KTM:  TmRollbackTransaction for tx 5ae0c80

 

Enjoy! 

 

Posted on Thursday, January 12, 2012 at 10:59AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References2 References

Design Alternatives - Conclusion!

So what we have so far?

 

  1. We want software mechanism to embed error or fault injection to see how our module handles, at least, some of the basic stuff.
  2. It is not enough to say, after using API we check for error, and take the appropriate execution path. Instead, we must exercise the error handling paths by injecting some basic errors.
  3. As always, the code to inject errors has to be minimally invasive in the sense that the code can to production, but will never generate the code for error injection. While under test, we can inject those errors as required.
  4. Alternatively, use external error injections as an ad hoc fashion.

 

 

All the triggers are of the first type of error injection. It is inside the production code, just does not need to be compiled into the production binary. 

 

On the otherside of the fence, is to use separate module/infrastructure etc., to inject errors. For kernel programming in platforms where filter and/or stream drivers infrastructures are available, they are sometime good alternatives. For example, under windows, I've used filter drivers to inject errors on IRPs. For linux/unix, I worked on stream drivers that take the burden of injecting errors to main device drivers at will.

That's it for a basic error injection and design to debug!

 

Happy new year to all!

Posted on Monday, January 2, 2012 at 10:31AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

Design Alternatives II

Continuing last note ...

What I found from my previous experiences that even if we format or simply add one char in a comment, and save source file, we should really go thru the whole process of building the software and run thru a series of test processes. This is bit strict, but for any production quality software we really should follow this process. 

On top of it, quality assurence process should not tuch the source and / or binary. They should mostly use as built software and do all sorts of error testing and analysis. There are variations of this approach, but mostly this reduces confusions among team members.

Most of the code we do for error conditions and possible actions are of IF-THEN-ELSE, and there are TRY-CATCH type patterns. One example is, for example, memory allocation. How do I make the system to force a failure in memory allocation API that was provided by the system library?

One approach is that we can wrap anysuch system calls in another functions and fail it there, but that requires extracode, and managing between Release and Debug build could be unnecessary work.

I use some simple macros in the code itself to make sure the error paths works. As an example, let's say I want to force an error on a memory allocation, I use the following pattern.

if (TRIGGER  ( (memVa = malloc(n) ) == NULL )  ){

HandleTheErrorPlease(...);

}else {

/* success */

DoTheProcessing (...);

}

 

This is just a simple example for triggering, and depending on what kinda triggering and at what point in time I want to trigger, I can inject error conditions at different code paths with different types of error.

 

Now what I get out of it?

 

1) Does it handle errors correctly?

2) No code changes, between production and test codes.

3) Total code coverage - to eliminate dead code.

 

Posted on Thursday, November 24, 2011 at 01:50PM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

Design Alternatives

It's been a while since I last posted here...

One thing I'm interested is to design for debugging. It means, that as and when we design some software, we want to make sure that debug code is still in the design. It should be placed carefully and turned on when needed.

You may ask, why? In fact, I was trying to implement error handling of all sorts in a large kernel module that is sitting in the middle of storage stack. And as we know, shoe-horning this kinda stuff is quite challenging. What it was for? It was to make sure (1) user data don't get lost or corrupted (2) Seamless recovery. 

 In my rookie days, I worked with very shallow level of fault tolerence in network stack - the idea was to modulerize the component(s), and roll-back ( meaning re-init) progressively based on watchdog response. And we all know there is a fundamental problem with watchdog mechanism - refer to internet for some nice articles on it by Embedded System Design magazine.

So I ended up copying some crucial ( hot-path ) routines, give them new names, and let the software take those paths that are only for error handling, and then debug / change / whatever... This way I don't upset the normal paths of the code. This is sort of Bypass surgery!! Then when I found those Bugs, found the symtoms, just change the circuit of code flow.

The problem was that injecting errors in the main storage stack of any OS would upset the system in an unimaginable number of ways. In Windows, I saw so many bugchecks like: many different varieties of Hive Corruption ( Note that Win 7 onward, Hive mgmt in the OS is more robust, and still ...), no paging file, ntdll's breakpoint interrupt and others. It basically indicated that - time of the error ( what type of IRP had the error; number of errors; load of the system at the time of the error; type of the load ( random, or read only, or write only etc) and other thing would determine in an yet unpredictable way for the type of bug check and/or corruption of the system volume.

This why I think design to debug is what shine for good software. As an example, the code should be armed with trigger infrastructure. For the detail of the trigger infrastructure, I will have to take another note later.

Posted on Saturday, November 12, 2011 at 09:22AM by Registered CommenterProkash Sinha | Comments Off

Looking thru Windows - It's almost summer!

This winter is pretty interesting! Few Blizzards in the East Coast, and fairly good amount of rains in the West! Really I can't believe it is already end of May and it is cloudy/rainy here. By now, it should have been a season to look thru windows - blue sky, beautiful armoa of spring with bunch of pollen to clear my nasal passage!

Oops, it is not intended to talk 'bout those windows, actually it is about Windows. It's been a while that I could not get my hands around it, now getting back to hack.

For a while, may be 5 years or so my better half was saying, clean up those mess, they are old CDs, the books are obsolete, and why do you need to carry them around. I was just ready to put some of those in the recycling place, and now I need them. Actually quite badly to brush off  the dust from the books and from my head!

 About 20 years ago, it was really fun to solve some graph algorithmic problem, well it was due to lack of better things to do perhaps!. Now I see another branch of applications where I really need to understand the difference between RAM and PRAM programming model for having an estimate of the alogorithm performances. These were abstract concepts at that time, now they have real and practical significance.

For example, Memory Managment and storage stacks. We know that from the higher level a storage system is abstracted as an array of blocks of memory on the secondary storage. And for a long while, this has been one of the fundamental abstraction now due to different media technologies the implementation of this abstraction could be quite different, and guess what that requires some fundamental understanding of graph algorithms. 

To name a few: List ranking, B tree, B+ tree, Shortest Path, Eulerian Circuits etc...

Some of us know what they are, or even know where to look before implmenting it...

But I will later try to stress some of them by asking questions like: What, Where, How, and When and finally Why.

 

Posted on Thursday, May 26, 2011 at 07:25AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference