Stable Marriage Algorithms!
Before, I continue on patterns of x64 assembly language, I've to take another pause. Yes you can see that the heading chanaged again...
I was stuck to dig out one problem we thought could be totally transient. This is where, simple, crisp design is so important when it comes to kernel software. Until we get our head wrap around it, we should always look for a way to decompose. The reason behind it is reasoning. Some transient problems are so hard, that disambiguation or rather make it consistently reproducible is very hard.
I was trying to find this transient behavior, in a software component, that is somewhere in storage stack to enhance usuability. Since we can not trust the file system, I was using block level I/O to produce it and if possible make it deterministic. This was a requirement...
So I started out with opening a target disk, and write, read, and compare the data, synchronously & asychronously. For sync I/O we know, just do the I/O one sector at a time using the Windows API, for async we have few choices: Threads, Overlapped I/O etc. I picked overlapped I/O, and wait until all the payloads are written, then read, and compare...
How do we compare? And how do we make it as deterministics as possible?
First one was not that tough, we had pre fabricated data basically a sequence starting from whereever we want, and each sector will have a specific pattern at double word buckets. First sector will have the first number, second sector will have the second number etc... So comparing is easy, from the buffers as well as looking at the raw disk.
For getting it to deterministic, repeatable reproduction of transient bug, I simply could not do that. I had payload with random number of sectors with random starting address, one sector at a time, multiple sectors at a time, clustered sectors etc and they can be used both for sync and async. For example, if I want clustered, it will break a payload of n sectors into 1, 2 , 4, 8, 16, ... remaining so that 1+2+4+...+ residue = n sectors, and you can do both sync, async among those payload.
While we found some other bugs that were easy to fix, one transient behavior was not solved.
What I know, and it repeat itself is that testing alone is not the way to get out all the bugs of a component. Its a blend of design/design/design, simplify/simplify/simplify, structure/structure/structure the primitives... Then reason/reson/reason while the process of development goes on...
By the way, did I mention that I read the book: Science of Debugging. And I was carring a paper called "Needle in Haystack" and another monograph named "Stable Marriage and its relation to other ...".
The last two were to show respect and hope the family undestand why I'm working long hours when 90% of people around the small city are in bed :)
-pro
Note that it is a good testing mechanism, since the component under test, being a kernel module needed to be bombarded with random I/O at a high speed. Moreover, when I was doing just that, it was helping me debug both the target component, as well as the test component. It was helping each other, if you need more, I would be happy to explain later.
Things to remember when in emergency!
I'm sure we all know that emergency has different meaning for different people and different things. Emergency preparedness, when debugging could be handy I think. And it is more important, when we think it is emergency, hence not all regular matters. Being not regular categorically implies that we will not remember it dearly...
When it comes to x64bit assembly instructions, it is quite a bit of different from x86.
So here are some, just to warm up -
### Notice how the code was chewed up - no code for the logic, different from x86
### Look at how the args offset on the stack are stuffed in
_TEXT SEGMENT
fristArg$ = 16
secondArg$ = 24
function_return_void_takes_2_arg PROC
; 23 : {
$LN5:
mov DWORD PTR [rsp+16], edx
mov DWORD PTR [rsp+8], ecx
push rdi
; 24 : //chk if 1st arg is zero
; 25 : if (fristArg == 0 ){
; 26 : return;
; 27 : }
; 28 : //chk if 2nd arg is non-zero
; 29 : if (secondArg != 0 ){
; 30 : return;
; 31 : }
; 32 : }
pop rdi
ret 0
//void f takes 4 args
void f_void_take_4_arg( ULONG First, ULONG Second, ULONG Third, ULONG Fourth )
{
int i, *ptr_i;
// init i
i = 0;
//take the addr of
ptr_i = &i;
}
### Notice how the args are pushed thru registers ( 1st 4 args usually )
_TEXT SEGMENT
### Note how the offsets are being stored in these *$ vars, where * is the variable name
### Also watch how the offset took care of any stack push. Compare to x86 gened code too
i$ = 36
ptr_i$ = 56
First$ = 80
Second$ = 88
Third$ = 96
Fourth$ = 104
f_void_take_4_arg PROC
; 37 : {
$LN3:
### Watch this, callee is pushing the args onto stack, caller did not - This is shadowing the args!
mov DWORD PTR [rsp+32], r9d # 4th arg
mov DWORD PTR [rsp+24], r8d
mov DWORD PTR [rsp+16], edx
mov DWORD PTR [rsp+8], ecx # 1st arg
push rdi ## This push was taken into account when *$ vars for respective offsets are defined
sub rsp, 64 ; 00000040H ## Another fine points its not just for local variable space, also for calling down path
mov rdi, rsp
mov rcx, 16
mov eax, -858993460 ; ccccccccH
rep stosd
mov ecx, DWORD PTR [rsp+80]
; 38 : int i, *ptr_i;
; 39 : // init i
; 40 : i = 0;
mov DWORD PTR i$[rsp], 0
; 41 : //take the addr of
; 42 : ptr_i = &i;
lea rax, QWORD PTR i$[rsp] ## get the address of i. lea does not evaluate the indirection, just adds the offset.
mov QWORD PTR ptr_i$[rsp], rax ### put into the ptr_i
; 43 :
; 44 : }
mov rcx, rsp
lea rdx, OFFSET FLAT:f_void_take_4_arg$rtcFrameData
call _RTC_CheckStackVars
add rsp, 64 ; 00000040H ## to get to the register we pushed, we need to go back up the stack sub used
pop rdi
ret 0
f_void_take_4_arg ENDP ## end of procedure
_TEXT ENDS ## end of this code seg
Next I will delve into some common pattern blocks like: while; do while; for; switch; if then else etc. Then I will also give some normal stuff like structure access, procedure calls etc. So that we will have a handy reference for the code when we have to look at disassembled systems code.
Enjoy!
Pi day !
Today, the 14th of March is Pi ( 3.14...) day!. Does it mean it's an irrational day!! Perhaps not...
There is an interesting constructive approach to show that there are irrational numbers. IIRC, Cantor had the constructive method to show that the number system ( i.e. the real number systems) does have other numbers beside rational numbers ( intergers happen to be the degenrate case of rational number, since the denominator is trivially 1).
Mathematical computation has been engaged for long to get even larger decimal expansions of Pi. While they are on their quest for larger and even larger Pi, one thing to note is that we expect not to find any finite recurrence of repeatable numbers. So whatever the expansion we have so far, there should never be a finite numbers of decimal expansion that repeats itself. If we ever get to a situation like that, Pi would no longer would be qualified as interseting, yet irrational number!. It will degenarate into rational number.
Dedekind also had a way to construct real number, that includes irrational numbers!. Following a similar approach, there is new number system - called surreal numbers. This has a significant implication in combinatorial games, and artificial intellegence. In most board games, alternate moves of the players creates different positions. These postions can have game values in certain kind of games, and that can dictate the search strategy...
Memory on the High Lane!
Flash memory technology is now getting to a state that hard disk secondary storage for lots of devices becoming a defacto standard. One of the main reason is speed due to lack of seek time. Yeah there is no rotating disks. Another is reliablity. So as of 2011/2012 we see lot of devices that are using SSD made of Flash memory. Solid state disk (SSD) is becoming very popular every day.
From the systems point of view, one of the interesting feature of SSD is that it can have multiple I/Os concurrently depending on the packaging and FLT(flash translation layer). And this opens up a relatively new ways of building systems. In particular, file systems, host device drivers, and flash firmware designs became very hot as well as very sohphisticated.
Traditionally, Filesystems were designed with HDD ( hard disk drive) in mind, and lot of them are not necessarily well performing file systems for SSD based storage media. Some of the old file systems are supported by providing host device driver and flash firmware that takes care of the underlying mechanics of FLASH somewhat.
So for the systems designer/programmer, understanding the underlying attributes and applying the necessary technological alternatives are very challeging. In a following series of notes I will try to encompass or rather chalkout some of the pros and cons of HDD based storage subsystem. Then will try to introduce the main design alternatives being considered for file systems.
Traditional File Systems (FS) were to give a representation of the sectors of an HDD in a structured and intutive way. Structured meaning, from the users point of view, we only need to understand directories, and files in an hierchachical fashion. And it is intuitive, since we know the office/home cabinet and filing systems. Then to avoid excessive file corruptions, log structured file systems became popular. Log sturctured file systems writes a log about a transaction, before committing the data into the physical storage that backs up the file system. The idea behind this is to ALL or NOTHING paradigm. Loosly speaking a transaction may involve (a) metadata change(s); (b) commiting the data into the backing storage. So to avoid excessive incoherent user data, ALL or NOTHING paradigm became essential.
If the system fails after committing the log, but before committing the actual data, then log file could be used to recover the transaction and commit the data. In that case ALL state for that transaction can be achieved. But if the failure occurs while it is committing the log, it can always go back to NOTHING. So retry the transaction again from the start, if possible.
Now it is only interesting to keep only the last few ( few is relative word here), log file does not need to continuously grow. Hence log file is usually circular in nature, meaning old records are erased.
For SSDs, they are mostly NAND base. And NAND has random access ability, so it is different from HDD in terms of seek time, latency etc. Also flash memory uses out-of-place writes. Meaning that an old valid data can not be over written. It has to be written to an yet fresh write location. So the mapping of virtual to physical sectors changes all the time. As an example, if we write some data to a sector once, we can not use that sector before erasing that block and make the bits to 1 for that sector. When a sector has all the bits set to 1, it is a write-once state, then once data is written, we can not over-write without erasing that sector. So every once in a while, someone or something has to move the good data of a sector to a different place, then erase the sector. Flash uses block erase, where a block consist of one or more sectors. This is one reason that random write could slow down the performance. Not because of the read access becomes slower, but because the underlying systems needs to garbage collect by erasing and compacting valid data into blocks of NAND storage.
These and other attributes opens up lot of alternative approaches to system design and underlying software techniques. For example, lot of data stuctures now have interesting applications for these systems, and Log structured file systems(LSFS) are becoming bit more practical and important.
An LSFS is purely based on log structure, meaning the whole file system is based on LOG.
Finally I will try to emphasize some of the analytical methods and associated data structures that are suitable for SSD based I/O subsystems.
Debug me --- if I bugged you enough!
No it is not what you think I'm saying! It is your software packages is saying, right?
So you designed, developed, and debugged your software package. It consists of several user and some kernel components. You have your unittest component you built while you were coding your design, you tested...
It works. You feel proud of yourself! And you should be proude of yourself! Enjoy your spirit...
Then comes the testing from the testing people, including a majority of harness/stress/perf testings. And for all of these, unless yours is a big software house, prefabricated tests suits are used.
One particular scenario is the storage stack under windows. What happens if you have 3 or 4 components in the kernel that are directly adding values to the storage stack? Oh, that surely scare the hell out of me. It's like plugging a pacemaker into an otherwise healthy heart!!! Of course, there are other examples where - a thorough third party off the shelf test suits are required to be run on test machines.
For harness tests, on storage stack I learned to run SysMark, PCMark etc. Well, these are to test an all round different scenarios an user might end up with when they play with their PCs!
The problem comes, when these softwares are not that solid! What to expect when such a preview package is for free?
So one alternative is to run the tests under kernel debugger configured, meaning, the system boots up to the boot configuration where the debugger is on. Exception or other test application faults happens, you take down the symptom, and take the test part out of the test suits, if you can. But some of the exceptions could be handled by the system, if debugger is not hooked up.
Under this last condition, when the test suit runs fine without debugger enabled, is particularly a good one. Now you can not configure debugger, so what you do?
Well, here comes the DbgView from sysinternals ( now Microsoft tools).
What should I do now?
-- Well the tests runs for hours, it boots and reboots during the tests.
-- I need to have certain informations being printed to logs from the kernel component we developed.
These two are the basic goal. How do we solve it?
We can develop the components for debugging. Yeah, some will get scared to hear that "Develop for debugging". So for DbgPrintEx, you can use your own masks based on your components' features. If they are enough modular, you can have module specific masks.
Now you will have to enable the registry entries for the test systems. Here is what you need ( read online for the detail about what you need to create in the registry hive )
Now you need a batch file, that would start at boot time. You store this batch file and dbgview.exe in c:\. Then you make a short cut of this batch file. Then from Windows->Start->All Programs->Startup, right-click then open then cut and paste the short cut. So first you copy the short cut then windows->Start->
@echo off
rem -- launch dbgview at startup
start c:\dbgview.exe /t /f /l c:\dbgviewLog /g /k /m 10 /p /w
First time it boots, it will ask for accepting license args. Next time onward, it would automatically start at boot time.
Here is some sample result from the logfile (dbgviewLog)
[\\VANHALEN-D4]
00000001 0.00000000 KTM: TmRollbackTransaction for tx 61859f0
00000002 0.01082284 KTM: TmRollbackTransaction for tx 61859f0
00000003 0.02614005 KTM: TmRollbackTransaction for tx 61859f0
00000004 0.03681052 KTM: TmRollbackTransaction for tx 61859f0
00000005 41.99409103 TpmEvtEntropyTimer().
00000006 118.64528656 KTM: TmRollbackTransaction for tx 5ae0c80
00000007 118.65682220 KTM: TmRollbackTransaction for tx 5ae0c80
Enjoy!