Is it showing age or mental fatigue or ...?

In my graduate school days, one of my favorite professor told me that I've to fiddle around with formulas, patterns, and little concept to nurture the brain. So I tried to that and I still tries to do it... And I saw both a success and failures. It is basically the interplay of intuition and rigor. Both tries to beat each other. And if rigor can win I'm all the more happy.

So I thought I will use rigor, and set out to draw a simple state machine to search for a line in a buffer of character. So until the end of the buffer, I've to search, find '\r' and '\n' in two succeeding position, and return the length of a line if found otherwise length is zero. It is very elementary form of boyer moore string algorithm. Note that I could have taken a getline() function from standard library, and change it point to buffer instead of stdin. Well, the success was there, but after few failures. My mistake was to capture all the three states: start, CR, LF. And when I reach LF state, I return the length. Now I feed a line consist of only CR, and LF. I get to LF state, but I don't have any more char in the buf, so I happily declare that I don't have a complete line. Actually what should have been done is that as soon as I reach to LF from the CR state, capture and return... So the moral of this error is that "Reduction of states" that I learned a long time back and forgot.

I did not try to do the simple way of writing a while loop and parse thru because I wanted to keep the scanning separate and if necessary I could introduce more states to change the state machine for other recognition, so basically my thinking was for patterns in general.

I think once in a while we should try to solve the exercises of the classic book on C and C++. 

Posted on Sunday, March 29, 2009 at 08:13AM by Registered CommenterProkash Sinha | CommentsPost a Comment

Where is the talent, you may say!

I've been thru a few scenario where I had to follow some documents, some tricks, and other source of informations to debug start up code of an NT service, and the normal debugging of a service when it is out of the startup. But imagine a scenario, where you would launch some root processes in the sense that they are individually created and then some of these would launch some other processes, and so on. So there is a process trees ( parent and child ) relationships that will show in a graphical form if you kick of procexp (process explorer) tool from sysinternals.

Now if I want to debug some of these processes at the start ( they have no interaction with SCM, the service control manager of NT), one way use to create an entry in the imageExecution option registry key. So as soon as your debugee process starts, it would launch windbg debugger, and the process would under debugger... Now this set of processes, as you might have imagined worked in harmony by message passing methods. I'm now ready to fire up apps that send and receive requests toward the set of processes, but I can't break into the process running under the debugger, and it seems like it does not have the ablity to talk to other processes as well. What a bummer!.

 

But if I fire up windbg after all the processes had been launched, and attach that process, then I can debug the steady state of that process. By steady state I mean anything but the startup code. If for one reason or other there is a start up failure, I've to try the first method, debug the hell out of it, then switch the gear to the 2nd method. I guess stick shift driving is all that there is to it!

Posted on Sunday, March 1, 2009 at 04:44PM by Registered CommenterProkash Sinha | CommentsPost a Comment

Programming for debugging!

Any large program needs debugging. There are various forms of debugging:Traces, stepping thru the code under a debugger, visible and audible responses to actions, analyzing output against input, and a whole lot of other mechanisms that is most appropriate for the program.

For event driven piece of code, it is almost essential to have debugging code to trace the events and corresponding actions or responses. Protocol implementations, window based applications are some of the examples of event driven programs. And they always need some sort of event driven ( state machine ) type codes. And quite often the underlying engine may be implemented by others and sources are either unavailable or so large and complex that it really does not help much than to get us painfully bogged down into it. So how exactly we do coding for debugging in these situations?

 

(1) make sure that your defines for the states are as logically grouped as possible, and keep them in a data structure with the corresponding meaningul name of the event. For example of a window code ...

typedef struct Win_Events{

int  _EventConst;

char * _EventName;

};

 

Now you define an array of such structures (if you know the maximum number of events), or you can use a dynamic allocations.

 

(2) Write your state machine logic in a way so that you know what events you are not tackling by tracing those events in the unhandled cases. Some time state machines are coded using 'switch' statements of c/c++/java/c# and they can be nested to 3 or 4 levels. In each level of nesting one should use the 'default:'  label to ones advantage for such tracing. 

 

When I look at code for inspection, I look for these sort of things that are easy to forget, but essential to defensive programming. Unless these things are in place, it is very hard to debug these types of codes that are mostly event driven.

Posted on Sunday, January 25, 2009 at 01:00PM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

WinDbg

Dot commands – built-in debugger commands

Kd> .help – what dot commands exist

Kd> .chain – what extension dlls are loaded

Kd> .symfix; .sympath

Bang commands – extension dlls

Kd> !sym noisy – turn verbose on

Kd> !load <c:\debugger\Myextension.dll>

Other commands – built-in

Kd> lm t m nt*

Kd> ln  <some call return address>

Kd> ~n – (km)change processor stack, (um)thread stack switch

 

Basic Environment setup

• CNTL+BREAK - to break into the debugger

• .logopen <filename> - to have a session log

• .logappend <filename> - to append session log

• .symfix – to initiate symbol path

• .sympath – to see the symbol path setting

• .sympath+ <blankspace> - to clean sympath

.sympath+ <c:\WebSymbols> - to add a symbol path to sympath

.srcfix – to initiate src path

• .srcpath; .srcpath+; srcpath+ <b>; srcpath+ <mysrcpath>

• .reload Your.sys

 

Useful commands

• shell notepad <filename> - to browse logfile

• .echo to add your comment in a debug session

• .reload [/u] <my.sys> – to load/unload symbols

• .chain  -  to see what extension dll loaded

• !process x y      – to get process and thd level infos

• !ananlyze –v     – to see the stack from a bugcheck

• !sym noisy          -  to enable noisy mode

• Lm   - to list module ex: Lm t m xyz*

• Bp, be, bl, bd       - to set/unset, list break-points

• Kv, kb             - to look at the stack

• .server tcp:port=<tcp port number> [, icfenable]   - to allow remote session from another windbg

 

Debugging Session Types 

Live debugging

          –Checked build

          –Checked build with  no_opt enabled ( usually x64)

          –Free build

Crash Dump Analysis

          –Mini dump– Used mostly for OCA (online crash analysis)

          –Kernel dump – Used mostly for Kernel mode debugging

          –Full dump – Used when user/kernel stack is needed

 

Kernel debugging preliminaries

Register usage conventions

         –Volatile / Non volatile

         –Integer / Floating Point

• Calling conventions

         –Parameter passing (Register and/or call stack)

         –Call stack setup and teardown

         –Caller / Callee responsibility

        –Exception Frame

• Call stack

• Global / Local variable conventions

• Code generation pattern

 

X64 Registers 

• 16 general-purpose registers (integer registers)

• 16 XMM registers available for floating-point use.

• A register is either Volatile or Non-volatile.

• Volatile registers are scratch registers that the user can assume are destroyed across a call.

Nonvolatile registers are required to retain their values across a function call and must be saved by the     called function (aka callee) if used. 

 

Volatile Register conventions

 • RAX  - Volatile - Return value register.

• RCX  – Volatile - First integer / pointer argument. Or C++ implicit this pointer.

• RDX – Volatile - Second integer / pointer argument.

• R8 – Volatile - Third integer / pointer  argument.

• R9 – Volatile - Fourth integer / pointer  argument.

• R10:R11 – Volatile - Must be preserved as required by caller; used in syscall / sysret instructions.

• XMM0 – Volatile - First FP argument. Also used to return FP value.

• XMM1 – Volatile - Second FP argument.

• XMM2 – Volatile - Third FP argument.

• XMM3 – Volatile - Fourth FP argument.

• XMM4:XMM5 - Volatile - Must be preserved as required by caller. 

 

Nonvolatile Register Conventions

 • XMM6:XMM15 – Nonvolatile - Must be preserved as required by called function.

 • R12:R15 – Nonvolatile - Must be preserved by called function.

 • RDI – Nonvolatile - Must be preserved by called function.

• RSI - Nonvolatile - Must be preserved by called function.

• RBX – Nonvolatile - Must be preserved by called function.

• RBP – Nonvolatile - Can be used as a frame pointer. Must be preserved by called function.

• RSP – Nonvolatile - Stack Pointer. 

 

X64 Calling conventions

 • Only fastcall – fast due to register usage

 • Parameter passing

     –First four integer / pointer parameters goes to rcx, rdx, r8, and r9 C++ takes rcx implicitly to pass this pointer

     –Fifth to nth parameters are pushed on to stack

     –First four floating point parameters goes to xmm0:xmm3

     –Caller reserve stack space for the register parameters

     –Callee needs to shadow register parameters if it needs to take address of any parameters. va_list is          an example.

     –At most four arguments are passed in registers when integer, pointer, and floating point arguments          are mixed in a call.

 

Function Types

Frame function

       –A function that requires a stack frame. A frame function allocates stack space, calls other functions,           saves nonvolatile registers, or uses exception handling. It also requires a function table entry. A                 frame function requires prolog and epilog code. A frame function can dynamically allocate stack space         and can employ a frame pointer. A frame function has the full capabilities of this calling standard at its         disposal. If a frame function does not call another function, then it is not required to align the stack.

Leaf function

        –A function that does not require a stack frame. A leaf function does not require a function table                  entry. It cannot call any functions, allocate space, or save any nonvolatile registers. It can leave the          stack unaligned while it executes. 

 

Prolog and Epilog

mov   [RSP + 8], RCX                ; shadow the param
push  R15                                ; non-volatile registers
push  R14                                     ; must be saved before
push  R13                                     ; use.
sub    RSP, fixed-allocation-size ; local variable
lea     R13, 128[RSP]                   ; into local stack

  [ …]

add   RSP, fixed-allocation-size ; local var out of scope
pop   R13                                    ; restore non-volatile
pop   R14                                    ; registers
pop   R15
ret

 

Exception Frame / Table

SEH:  __try, __finally, __except

C++:  try, catch

Frame based in 32 bit, Table based in 64 bit

Frame based SEH pattern in a prolog that uses SEH

    –Push  0xnnnnnnnn

    –Push  0xmmmmmmmm

    –Mov    eax, fs:[0x00000000] ; fs[0] points to current TIB

    –Push  eax                         ; save the TIB

   –Mov    fs:[0], esp  ; store current stack pointer

 

See Matt Pietrek’s article in MSJ jan 1997

Frame based is slow, and prone to buffer overflow attack

 

Table Based SEH of x64

X64 image (binary) follows PE+ structure

PE+ has a new directory (section) called exception directory

Function prolog using SEH is not very different.

Exception section has zero or more RUNTIME_FUCTION records

Each function using SEH creates a RUNTIME_FUCTION record

Each RUNTIME_FUNCTION record points to a Unwind record

Each Unwind record describes all the places where __try is used

A unwind record may contain offset to SCOPE_TABLE

Each SCOPE_TABLE is captures each exception block of a function

Ref: psdk, wdk, and Nt Insider ( may-june 2006)

Use debugger-built-in-comand     .fncnt

Posted on Thursday, January 1, 2009 at 03:52PM by Registered CommenterProkash Sinha | CommentsPost a Comment

Concurrency and Distribution in computing

With the advent of multithreading, concurrent processing is now available to application programmers. And synchronization of shared object is more prevalent than it used to be. A shared object is an object which can be modified and or accessed simultaneously by more than one thread of execution. Often you will see books, blogs, articles, and discussions about serialized and synchronized accesses to those shared objects. And there are a few general patterns to attack those serialization and synchronization problems.

Any software engineering and computer science course arms people with the weapons necessary to tackle those situations. But where can we build the intuition. For example, we all know if one queue of customers in a bank is serverd by ( say ) three tellers, it is better than three queues independently served by three tellers.

Here comes the subject of Operations research. It is a very vast branch of Applied Math, but I still don't understand why computer science curriculum does not have the basics like PERT/CPM and scheduling etc. from that aspect. This is where one can think parallel processing. Once this lively topics are taught, I think the parallel programming would become more friendly to uninitiated.

 

This will also give a very good idea to partition a problem for parallel processing. Once this is understood and intuitively obvious, the distribution and distributive computing becomes just the next step.

Posted on Tuesday, December 23, 2008 at 07:24PM by Registered CommenterProkash Sinha | CommentsPost a Comment