Design Alternatives
It's been a while since I last posted here...
One thing I'm interested is to design for debugging. It means, that as and when we design some software, we want to make sure that debug code is still in the design. It should be placed carefully and turned on when needed.
You may ask, why? In fact, I was trying to implement error handling of all sorts in a large kernel module that is sitting in the middle of storage stack. And as we know, shoe-horning this kinda stuff is quite challenging. What it was for? It was to make sure (1) user data don't get lost or corrupted (2) Seamless recovery.
In my rookie days, I worked with very shallow level of fault tolerence in network stack - the idea was to modulerize the component(s), and roll-back ( meaning re-init) progressively based on watchdog response. And we all know there is a fundamental problem with watchdog mechanism - refer to internet for some nice articles on it by Embedded System Design magazine.
So I ended up copying some crucial ( hot-path ) routines, give them new names, and let the software take those paths that are only for error handling, and then debug / change / whatever... This way I don't upset the normal paths of the code. This is sort of Bypass surgery!! Then when I found those Bugs, found the symtoms, just change the circuit of code flow.
The problem was that injecting errors in the main storage stack of any OS would upset the system in an unimaginable number of ways. In Windows, I saw so many bugchecks like: many different varieties of Hive Corruption ( Note that Win 7 onward, Hive mgmt in the OS is more robust, and still ...), no paging file, ntdll's breakpoint interrupt and others. It basically indicated that - time of the error ( what type of IRP had the error; number of errors; load of the system at the time of the error; type of the load ( random, or read only, or write only etc) and other thing would determine in an yet unpredictable way for the type of bug check and/or corruption of the system volume.
This why I think design to debug is what shine for good software. As an example, the code should be armed with trigger infrastructure. For the detail of the trigger infrastructure, I will have to take another note later.