Battling out the battle

In a recent discussion about a problem, I came to know an intersting problem. The problem is really an old one, namely, document managment. You may say what is there to solve?. Just have large house or office or warehouse, have some high powered printer/copier, ask warehouser or some other companies to cut more trees and make papers to keep your records ( i.e. the documents). Well, not exactly what we were discussing. As you know an old information is no information. And structuring, organizing, analyzing, persenting ... are all very important in every phase of life.

Electronic document processing encompasses email, IM, conference, meeting materials, fax, images, videos etc., etc. Servers who handles these in a distributed fashion are really not that efficients. They often becomes bottleneck for highly loaded document transactions. Approach to solve these are (a) put more hardware/software to distribute the loads (b) improve the efficiencies of the components that made up your document management systems. Solution (a) is not that interesting since it would cause a managing nightmare, it will have multiple points of failures, it will cost more money, it will consume more power (yeah, talk about green ...)

So following the option (b), the first thing to look at is the network processing power. Since 10GB is not yet a household thing, the attention goes directly to protocol processing. When it comes down to Windows server, we are locked with a bunch of LSPs in user mode, and protocol stacks (including NDIS warppers). Most of us know that latency-wise UDP is faster than TCP but it does not have any reliablity. Moreover, if sockets are used ( which happes to be a predominant API for user level network programming in an IP environment), a lot of background processing is left to the work-horse of network implementations of the underlying OS. We all know that most of the general purpose OSes are not real-time, so there is nothing to blame about those network stack implementations. They have their purpose in life, and they solve certain sets of problems.

The obvious questions are ---

1) Can we play unfair?

If we can, then a kernel mode compnent can do some efficiencies.

2) How hard it is to come up with a bare bone protocol implementations that would bypass the TCP stack?

3) Can a kernel socket library help improve the situation?

 

You might say, firstly all of them have their places and there are tradeoffs. Secondly you might question, How do you know where are the problems?

Answer to second one is easy, and I will, as usual, let you wonder ...

For the first one, there is nothing to answer, life is a tradeoff !!!.

 

 

Posted on Wednesday, November 7, 2007 at 03:36PM by Registered CommenterProkash Sinha | CommentsPost a Comment

To be or not to be !

Time to digress from the Tools and Tricks of the Trade ...

For few months, I've been tackling a very largish software (400,000 plus lines of codes). Quite a bit of it in user space, thank god!. And there are quite a bit in the kernel mode space. It is one of the toughest client/sever application I ever dealt with. A bunch of competent software engineers wrote and integrated over a period of 5 to 10 years. Beauty of this software is that it has lot of extreemly powerful features ( fault tolrance, resouce balencing, message passing infrastructure with delayed and priority queuing, some transaction models and what not). Beast of this software is how it was integrated with different components. The  communication interfaces between these modules are not terribly bad, but hundreds of global variables to track states. And there are times when 20 concurrent threads are not unusal.

So this is one example I've seen so far that is good for studying concurrent programming in action. It uses a whole lot of techniques I learned some 20 yrs back when in graduate school. But what I learned from my prior background in Mathematics that laws, formulas, theories are there to guide, not to obstruct you. So use in a way as if you are cutting cheese, not bringing down a multi-story building... Yes you can very well tell if something is hacked out or not, it does not matter who designed it, who wrote it, who debugged it...  IF YOU HAPPEN TO READ THIS ARTICLE, AND EVER GAVE SOMETHING ON FACEVALUE, stop doing that...

Now how do you analyze this kind of software, and try to make the best out of it... Yes a solid debugger is friend here. You do path analysis to find some taste of it. So I need a kernel mode symybolic debugger that is not scared to do user mode debugging. Softice used to be great in that space. I've used it to debug 16bit to 32bit user mode to 32bit kernel mode software. But now they would not have any support for 64bit, and they are phasing out. And I was not very successful with Windbg, but I'm getting a bit of return from using it. I don't like it in this particular situation, but what can I do?

If you are familiar with windows SCM, it has a requiremnt of 30 sec ( or some small epoch) for service to return to its requests. So if you use the specified approach to debug services, you will do all the song and dance with registry entry to do service debugging, and that too after the SCM interaction, and there would be no way to step into the kernel mode debugging.  This is I call the "Debugger is running for user mode debugging". Alternatively, if you put a hard break point ( like int 3 or some api's that does it for you ), and if you turn on the kernel mode debugging ( using boot.ini or other boot time features), you will  break into your service while your debugger is working with all its kernel mode capability. You step between user and kernel mode at will.  But I found sometime it works like magic, some other time it is all wet. And also not recommended to be used that way ( as far as I know).

I wish I could talk to windbg ( just like hi fi movie theme)....

Me> Hey Windbg, can you be a debugger that move between user mode and kernel mode ???

Windbg> I'm thinking ( Tobe or not to be) ...

 

 

 

Posted on Saturday, June 2, 2007 at 09:43AM by Registered CommenterProkash Sinha | CommentsPost a Comment

Tools of the Trade I

Many a time I was bumped with software developers who thinks using one language is better than others in absolute term. I've seen too many discussion about these to mention here. I sometime see where they come from, and sometime I see how emotion sometime overrules sincere judgement.

Brushing all of these aside, I would like to try some of the tools of C++, if you happen to have to use C++ for you desgn. We all know that managing dynamic allocation and pointer manipulation is a major source of software error that ranges from memory leak to undefined behavior of a software. There are lot of software around that does wraps the languages dynamic memory allocation and management routines to catch bounces and leaks related faults. So it is certainly not a new technique I'm about to discuss!. But just as a simple exercise we will discuss some of the following interesting tools of the trade -

1) Placement address of new and when can I use?

2) Auto pointer and Smart pointer, and what it gives?

3) Unnamed namespace

 

Placement address of new is really a mechanism to preallocate a buffer from heap and use again and again for different types of object before you delete ( or free ) the preallocated buffer. Some of us are already used to this technique by using malloc and typedefing while using C. And this might very well be a known pattern to us. But the point here is that when we use the C technique it is not obvious for a reader of the code. Using the C++ language feature makes it more obvious to the reader. By the way, you can find this language feature(pattern) in lot of places !

 

Auto pointer is really one of the software engineering mantra of using c/c++ or any language that supports dynamic (de)allocation of memory. The mantra is that every dynamic memory allocation should be paired as closely as possible with a deallocation. So if I use a malloc in a function, I should free that memory before exiting from that function. But this is very limited. Quite often a function's whole purpose is to act as object provider to others. So it is not mandatory to free any local allocation. But quite often, if the dynamic memory need is very local to a function, using auto pointer mechanism is natural, and the semantic is very similar to local variable, and yet it is with varying size. At this point you should ask yourself "How does the local variables get freed?". They don't necessarily get freed. The right term is that they get out of scope. How does it happen?, you might ask yourself !. Well that is what I think is pure programming. I'm going to leave you here alone ...

Tobe continued ...

Posted on Sunday, April 15, 2007 at 10:10AM by Registered CommenterProkash Sinha | CommentsPost a Comment

Interesting Developments

About a month ago, local nightly news channel flushed out a message about "Finding the structure of Lie Group", often they are called E8 group. But it did not repeat the news again, since there are lots of more important news that targeted audience is supposed to watch. Fair enough, publicity and profitablity are interlinked, and we all know it...

What is so great about it?. It's a huge amout of computations, unparallel to lot of other computations we are used to so far. It took an organized approach by a team of mathematicians and computer scientists. The team, and American Mathematical Institute thinks it is a fundamental achievement in basic research. And it is going to help particularly the String Theory community. The community is engaged in an alternative way think physics and physical phenomena. I'm out of even basic physics for quite some time, I will pass it ...

This research will also spring up lot of thinking about parallel computations, distributed computations, algorithmic analysis etc., etc. And right around the same time I found another interesting project that has to do with Genome analysis in a distributed way. Both project have their main offices with couple miles of each other. The name of the project is GPU Folding. This projects aim is to understand protein folding, misfolding, and related diseases. The data gathered from this effort helps scientists better understand the development of many diseases, including Alzheimer's, BSE (aka Mad cow disease), some cancers, Huntington's Disease, Cystic Fibrosis, and other aggregation-related diseases.

 

If you don't know, Category Theory ( an advance topic in Abstract algebra), and Galois Fields ( fields of finite polynomials) are two fundamental areas of research for Cryptography as well as computer languges ( functional languages) that are more amenable to formal proof of programs.

So this kind of progress may help other areas of wedlock between Theory and Applications.

 

Posted on Saturday, April 7, 2007 at 05:02PM by Registered CommenterProkash Sinha | CommentsPost a Comment

Making sense of dollar and sense

Needless to say that I'm a big fan of osr news group from osronline.com. Despites it highs and lows of signal to noise ratios, it is one damn gathering of exceptional people of windows kernel mode programming. Yes, admittedly it is a very very narrow area of software/hardware, but purely from sampling point of view, this is good enough to see how often we struggle to make sense of dollars and cents ...

One of the current discussion is as a by product of x86 protection rings, is now Offshoring, cost of making a working drivers, reliablity or otherwise crapy software, etc., etc., ...

Don't get me wrong, I respect most of the views there, and you can peep thru this by registering to that site. But couple things bothers me big time. Here are some of them -

First, for uninitiated, while cost of a product is easily determined, the price of a product is not. This is one fundamental paradigm of "Theory of Economics". Obviously then what should be the cost of a production quality  driver under windows?. Frankly, I don't know. So there has to be some estimation of it. And this would depend on two things: (a)The type of the driver and its functionality, and (b)its quality. Whille (a) could be determined with a very wide variation, (b) is very difficult to measure to say it politely.

Given the fact that quality depends on the experiances of the team who designs/develop the product, cost can vary vastly since it is the price one is willing to pay as price to the team. Note that ones' cost is the price for other. A simple example is that lot of us migrated to another country to increase our price for services. Similarly lot of others migrated to less developed country to reduce the price and increased their chance of making their product and service more affordable to buyers at a lesser cost. In software, it is the labor to produce software. In measureable terms it is the wages either per hour or per month or per year.

So only thing remains to prove that qualility of the software producing by expensive people ( experts who charges lot more :) are indeed better in quality. Since the industry as a whole is not quite ready to have formal or semi formal approaches to tackle the quality problem, only thing remains to see is the failure rates of the software developed by people with lower wages. It is more like electing a government, and see they fail or pass.

But one thing to remember, when it comes to kernel mode software component, having a faulty product makes thing very nasty. So this becomes even more difficult for decision makers to go offshoring or stay with local experts. Perhaps it is best to have branch offices throughout the places where such experts are readily available. But then lot of companies love to have temp/contract/consultancy wings engaged in their product developments for lot of reasons. First those people are first to go when some slowdown comes. Second that those costs are business expenses, so from accounting perspective it is enticing.

That was my analysis, if you still wondering what I should say about all this, the punch line is -

" I don't necessarily encourage my kids to think about persuing high tech career ..."

 

Posted on Saturday, April 7, 2007 at 08:54AM by Registered CommenterProkash Sinha | CommentsPost a Comment