Monkeys on Trees I

A refresher -

A binary tree is a kind of tree, that can have only two subtrees. So it will have left subtree, and a right subtree. This is going to be our first topic in this series...

The use of it became important, mainly due to dynamic sturctures that needed for searching, among other operations like:Add, Delete etc. If we keep records (aka structure or objects ) in a linear fashion, then given a key ( which is nothing but an ID of a record), we need to find if it exists or not. If it does, return it or delete it or move up to the root etc. The operations depends on what we want to do with it.

We can see that if we use keys to make the tree in a sorted order, then assuming the best case, we can search for an item in the tree in O(logN), since the max height of the tree is O(logN) when fairly balenced. This is when BST (binary search tree) comes handy.

From the search and / or other operational perspective, there are 3 fundamental tree traversals: In order; Preorder; and PostOrder. Two fundamental assumption here is that a node has two children. When we say Inorder, we mean to say - Operate on left child, then operate on the node, then operate on the right child. In data structure and algorithm books you will find them called visit, instead of operate on a node, but they are same. This is where the operations takes place.

Preorder traversal is - Operate on the node, then operate on the left node, then on the right node. Postorder is defined as - Operate on the left, then on the right, then on the node.

Note that, in a BST usually the keys are in non-decreasing sorted order. So if you traverse the nodes, in Inorder, you will have the sorted list of keys. Hence the name, InOrder.

BFS is breadth first order, so if you draw a simple BST and relate the search to preorder, you see that BFS is achieved. Where as, DFS is depth first and it is InOrder traversal technique.

DFS is inherently recursive, so it uses the underlying call stack, while BFS is inherently iterative, so it uses a queue to push and pop nodes to operate on in prescribed order.

Before we go into those traversals, and related problems that could be solved, of course using examples, we want to have some introduction to the analysis technique. Basic analysis of these are well laid out in many books. What is new here, is the online algorithm analysis. So ...

For now, take a look at a linked list. Single link following the next element, until it reaches a null node. When it is bit large, and we want to optimize the access pattern using some heuristics like: Move the recently accessed node to front, hope it would be accessed again soon, we get into a debate of how good is the approach!! If we know the offline sequence of access pattern, then analysis is different than when there is no pre determination of the sequence of operations. This is where Competitive cost analysis came to play. 

MTF(move to front) in a linked list was first under the radar of analysis when it was quite popular in some systems level programming, and a due analysis was required. AFAIK, this kicked off a new branch of analysis called - Compititive cost analysis. And it was started by Bob Tarjan and Dan Sleator, back in 80s.

Posted on Saturday, April 26, 2014 at 09:20AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References2 References

Monkeys on Trees !

First my apology for not being active for couple months! Lately, jumping around like a monkey from one place to another and buring really big holes in my pocket, I had to back off from writing...

I've always thought about writing some thing about Tree ADT, and its various incarnations. First it is a very common data structure, second my mother told me that I've monkey's bone in my hip because I do seem to jump around many places except study desk in my school days. Ah, that gives me enough confidence write something about trees...

 Tree was considered, mainly because of applying order relation to find data in O(logN). Basically the binary search tree(BST) is based on implicit order relation, and expect the container is fairly balenced. In reality, a sequence of data may cause the binary tree to be quite unbalenced, so there are many re-incurnations based on different needs. The area of BST has been under active research for long time.

Due to distributed data processing for search, social media, etc. there is another thing came to be useful is - Compititive cost analysis for Online algorithms. This is now a fundamental basis for analysis of lot of data structures including BST and its variants.

The idea behind this analysis technique is to have some solid upper bound to compare with for online objects. Here an object could be thought of an element wrapped inside a data structure object. In case of a BST, inside a BST node. For distributed processing of queries, social media trolling etc. the elements are random, and there is no apriori knowledge of the elements. Think about a random infinite sequences of elements. Now apply your best strategy for you implementations, and analyze using the Compititive cost analyss paradigm.

A FRIENDLY WARNING - This set of notes going to be rather long, and perhaps boring, but I hope it is going to high light some key points !!!

The tree structure and its implementations & analysis is really a subset of graph algorithms, which in turn is a subset of Discrete Mathematics. For example, given a connected graph, and a vertex of the graph, create a tree rooted at that vertex is one example of what I meant by tree being a subset of graphs.

 Way back when I was in graduate school persuing Discrete Math and Algorithms, I was lucky enough to have both my advisors with Erdos Number One. A person has erdos number one, if co-authored a paper with Paul Erdos, a legend in the field of discrete math. So I got some exposure to think from high enough level to encompass related algorithms.

I intend to cover the following topics about Tree, and some more as I go along -

1) Creating random trees, using c programming and data structures.

2) Tree traversals ( inoder, preorder, postorder). BFS, DFS

3) Parameters and attributes of a Tree, why they are important etc.

4) Random set of operations - insertion, deletion, rotations, balencing etc.

5) Kinds of trees, and why they are needed - Red-Black, AVL, among others.

6) Different transformations - these are to solve tree related algorithms.

7) Offline and online analysis - Worst case, Average, Amortized, and Compititive analysis.

8) etc.

 

 

 

Posted on Sunday, March 23, 2014 at 11:04AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References2 References

parshing county, Nevada - continues

So what is parsing ???.

Defintion - A parsing function is a function that tries to parse thru a bit stream. This is a recursive definition!

At it's basic tenate, it is a step to take a stream and try to interpret. So for example, you just got a buffer full of bits, and one program (i.e. a parsing function) says, well it is a bit stream or byte stream or char stream. These are almost always NULL parser(s). Some has underlying, albiet very small, assumption, and some don't have any at all!. Stop for a moment, and see which has those assumptions!! What I mean here? Well, at its lowest level a bit is an identifiable unit. But for our purpose, it is more interesting to interpret as a part of a larger thing. For example, a nibble, a byte, a char, a number. So there are those assumptions. These may not be interesting while we talk about parsing, but is another fundamental assumption that we need to understand. Also getting bigger picture, almost always gives us a step foward to our understanding of things. For example, a parser usually deals with a token! So what is a token? It definitely has some assumption, and I wlll leave it at that... But the main point is  alphabet / separator / terminator etc. That I will discuss soon.

Now let say, given a stream of chars, a parsing function parses to say that it is a sentence for english language. And yet another took a stream of chars, and says it is a valid arithmetic expression. What is common here is an underlying rule that the function is trying to recognize. The rule is an english grammer in the first case, and an Algebra rule in the second case.

If you ask "Is there a possiblity, that a given stream can be interpreted as two different rules ... ?", then you will be right into the theory behind parsing - Autometa Theory. But we are going there, not yet at least...

Definition of a Parser - A parser is a program that implement a rule. A NULL parser don't implement any rule. A rule is usually a specification to recognize something like: a valid ip address; a valid statement of a computer language etc.

When a specification for a rule is used, it is usually thru a wide variety of techniques. One such powerful and robust technique is regular expression. A regular expression recogmize a regular language over a defined set of alphabet.  For the definition of alphabet, and regular langagues, please see wiki.

 As a very simple example, assume that we need to parse a string buffer and find out if it is a valid ip address or not. What is a valid ip address, you may ask!... A valid ip address is in the form of nnn.nnn.nnn.nnn in ipv4 version, where n is any number from the set {0, 1, ... 9}. A leading zero in any of those four parts, separated by dot (.) character could be allowed for simplicity. So 001.02.3.4 could be a valid ip address.

 

Now write a C program to see for yourself. This usually takes a bit of thought to program correctly ! I would try first, as if I never seen regular language and regular expression. Then we will go about definining a regular expression...

 

ip address  == [0..9][0..9][0..9].[0..9][0..9][0..9].[0..9][0..9][0..9].[0..9][0..9][0..9]

 

As you can see, leading zeros are allowed!. Now we can refine it if we want to. Then we will write program to test the specification and implementation is correct or not.

 

History - When regular expression, and parsing became a branch of science and widely popular, not many application of the technique were found beyond compiler and tools writer. Then came the internet and parsing became almost essential in the WWW technology. The result is that lot of languages now provide some kind of support to define regular expression using the underlying regular language, and parse information based on the definition of the expresssion. To name a few: Perl, Python, C++, C#, and Java.

 

More ...

Posted on Sunday, June 23, 2013 at 09:23AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References2 References

It's parshing county, Nevada. Wild wild west

It's parsing and not much different from what I said in the title. Yes, it is wild wild west. As you go along doing your day to day business, no matter how hard you try not to think about it, you eventually end up with stuff that you will feel that it is in your finger tips or you it is far out!

Parsing is one such thing when you try to program. I've  seen many different mistakes and not so intuitive for lot of people ( including me). So what is parsing, pattern matching etc.???

A bit of a history first, if you don't happen to know. Long long back, computers were very specific to do one type of computation or other. So there were computers for scientific computation, some were for massive data proccessing and business computations. So a genralization effort brought them together to have universal ( sort of ) computers where you can do many different types of computations. Currently, it is just simply taken for granted.

Now, for tackling interesting and somewhat big problems, there were already few languages sutiable to specific use, instead of using almost machine level language Assembler. For example, Fortran for scientific computation, Cobol for business computation.  There were others that came, did their service, and went away. Since some of these languages were to solve specific types of problems, they were bit awkward for other types of problems. Moreover, the I/O systems were thoughtout to  have block level interfaces ( specially to optimize system performance), and was directly exposed to the programs.

A major shift was that the I/O should be represented as a stream of something, and let the program interpret the way it wants. Most successful of this effort is the birth of C language, where stream of something was stream of characters. And the encoding for character is ASCII. Computer needs bits and bytes, not characters, not a word, not some formula... nothing.

So given a character stream what should I interpret it to, and how? This is where parsing comes to play. Basically what it means is that I, the program, want to interpret this stream this way, I know I can get meaningful information out of it.

No wonder, it is parshing county, Nevada in around 1850! Wild wild west, everyone has her own rule and interpretation!!

 

Parsing, to me still very elegant art, Art of Programming. It is mostly used to parse streams for computer language statment validation/interpretation. It has a rich foundation of computer science. Scanning a stream, giving it meaningful representation, then parsing is heart of computer languages and lots of other things. And to appreciate the beauty, and its non intuitive nature, just try to code up a program to parse a stream of text, and findout out if there is a valid c program statement. To be more specfic -

int main(){;;;;; return 0);} -  Is it a valid c program? And insert a dot ( . ) somewhere to see what it says. Very soon we will see that if-then-else, switch etc. going to make the program a living mess. And two questions we will have to answer (1) Is it correctly interpreting (2) Is it efficient?

So there has to be a science behind it too!!. If not, then why not would be the question for inquiring minds.

 

 

More...

Posted on Saturday, June 15, 2013 at 07:58AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

May I be the ring master?... Please!

One of the old tool of the trade is to craft a ring buffer for quite a few things to be used it for! If you, the reader did not hear it yet, you will hear it soon!!!

So what is it? And what is the use?

Well, sometime you just want to know the recent history of certain things, say kernel events, or the recent acitivities of a server or even your application. So you want to constantly  pump out trace messages to a buffer, but your buffer might have some restrictions in terms of how much memory you should allocate for it. And optionally, you might have a backing up procedure which will do the backup ( or rather write to a file) intermittently so as to assume that if you loose the buffer information, at least you expect the file would be there to take a peek.

While this can be used in a variety of situations, we need to understand one thing is that the paradigm I try to follow is "Coding for debugging"... So anyone can have an infrastructure, more on this later, to be dropped and use in your software for debugging and other stuff...

When I try to develop some code, one thing I try to keep in my head is to have it as a reusable code. Then the otherthing I want to follow is that can I try it in user mode code, before bringing it in to kernel.... And finally can I have some way to introduce debugablity into it, from the primitive printf and cousines to elegantly using language features!!!

In this example, the ring size is fixed, in terms of how many elements it can hold. For dynamic sizing it would have to be changed. There are situations, depending on how much memory is available in the computer, the ring size has to be self tunned, which necessiates it to be dynamically sized ring. Another missing part is intermittent flushing into a file, and yet another part is missing is how to use the language features to debug this infrastructure...

I would not go into the dynamic sizing, it could be another note based on request(s).

For intermittent flushing to a file, one of the approach is to run another thread that will act as a consumer, and take the ring buffer to a file. Not very difficult to do that either.

For the language use, why do we need this? Well, the message producer could be an aribitrary source, and then the ring buffer thread would be an intermediaries ( like the way cache works), and consumer would be another thread. So one question would be -  how can I bypass the ring, and want to see the producer is producing, and consumer is directly taking it from producer and writing to file. For this an elegant solution is to have virual functions that can be redirected on the fly using lazy binding. So it would take only one line change to direct the messages to file, instead to ring....

Succently, the producer of the ring could produce nothing at all or it could produce to the ring or it could produce directly to final destination, the file or it could produce to both with or without any assistence from cosumer of the ring. And if we cover all these cases, it would be easy to find the problem component of the infrastructure.

The base code for such a ring is given as an example, but full implemenation for handling dynamic binding and debuggin including producing in a lifo or fifo are intentionallly left out for now. A c++ class based using pure virtual functions and threading works just fine for me -

 

#include <stdio.h>
#include <Windows.h>

#define SIZE_DMESG 256

//ouput level
typedef enum _DMESG_LVL{
DMESG_DISBLED,
DMESG_QUITE,
DMESG_MODERATE,
DMESG_VERBOSE,
DMESG_UNKNWN_LVL
}DMESG_LVL;

typedef struct _DMESG{
DMESG_LVL lvl;
size_t msglen;
CHAR * msg;
}DMESG, *PDMESG;

typedef struct _DMESG_RING{
ULONG producerIdx; //current producer index
ULONG ringSize; // Number of dmesg messages the ring will hold at most.
KSPIN_LOCK queuedSpinLock; //multi-proc efficient spin-lock
DMESG **pDmesgArray; // ring Array of dmesges
}DMESG_RING, *PDMESG_RING;

PDMESG_RING g_pDmesgRing;


PDMESG_RING
AllocdmesgRing( ULONG ringSize )
{
PDMESG_RING pdmesgRing;

//allocate/init Ring structure that holds meta data and the ringbuffer
pdmesgRing =(PDMESG_RING) malloc(sizeof(DMESG_RING));

if ( !pdmesgRing){
goto ErrorExit;
}
memset(pdmesgRing, 0, sizeof(DMESG_RING));
pdmesgRing->ringSize = ringSize;
pdmesgRing->producerIdx = 0;

//Now allocate the buffer array ptrs to DMESG of length ringSize
pdmesgRing->pDmesgArray =
(DMESG**)malloc( ringSize * sizeof(DMESG*));

//set to null, it's being checked before putting stuff in
//to make sure the existing msg gets cleaned.
memset(pdmesgRing->pDmesgArray, 0, ringSize * sizeof(PDMESG));
if ( ! pdmesgRing->pDmesgArray ){
goto ErrorExit;
}

// Init the ring's queued spinlock
//KeInitializeSpinLock(&pdmesgRing->queuedSpinLock);
//tie it with global ring pointer
g_pDmesgRing = pdmesgRing;

return pdmesgRing;

ErrorExit: //coming here => error

if (pdmesgRing->pDmesgArray ){
free(pdmesgRing->pDmesgArray );
}

if (pdmesgRing){
free(pdmesgRing );
}

return NULL;
}


VOID
AddmesgToRing (DMESG_LVL verbosity, CHAR * mesg)
{
PDMESG pdmesg;

//KLOCK_QUEUE_HANDLE lockHandle;
//NTSTATUS ntStatus;
size_t countByte;

//first check for existing element in the ring
if (g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]){
pdmesg = g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx];
if (pdmesg->msg){
free ( pdmesg->msg);
}

}else{ //create a new
g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]= (PDMESG)malloc( sizeof(DMESG));
}

if ( !g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]) {
goto ErrorExit;
}

countByte = strlen(mesg);

//init the msg entry
g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]->msglen = countByte;
g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]->lvl = verbosity;
g_pDmesgRing->pDmesgArray[g_pDmesgRing->producerIdx]->msg = mesg;

//Acquire lock
//KeAcquireInStackQueuedSpinLock(&g_pDmesgRing->queuedSpinLock, &lockHandle);
g_pDmesgRing->producerIdx++;
g_pDmesgRing->producerIdx %= g_pDmesgRing->ringSize;
//KeReleaseInStackQueuedSpinLock(&lockHandle);
//release lock
return;
ErrorExit:
if (pdmesg){
free(pdmesg);
}
}

#define NUM_RING_ELEMENTS 2
void freeAll()
{
PDMESG pdmesg;
int i;
for ( i =0; i < NUM_RING_ELEMENTS; i++){

pdmesg = g_pDmesgRing->pDmesgArray[i];
free(pdmesg->msg);
pdmesg->msg = 0;
free(pdmesg);

}

for ( i =0; i < NUM_RING_ELEMENTS; i++){
free(*g_pDmesgRing->pDmesgArray + i);
}

free (g_pDmesgRing);
}

void main()
{
//char msgs[512];
char *buf;
int i;
AllocdmesgRing( NUM_RING_ELEMENTS );
for ( i =0; i < 3 ; i++){
buf = (char*)malloc(SIZE_DMESG );
memset(buf, 'a'+i, SIZE_DMESG);
buf[SIZE_DMESG -1] = '\0';
AddmesgToRing( DMESG_VERBOSE, buf );
}

for (i =0; i < NUM_RING_ELEMENTS; i++){
PDMESG pdmesg;
pdmesg = g_pDmesgRing->pDmesgArray[i];
printf("lvl=%d Verbosity=%d, msg=%s\n", pdmesg->lvl, pdmesg->msglen,pdmesg->msg);

}
freeAll();


}

 

 

 

 

 

 

Posted on Monday, May 27, 2013 at 08:49AM by Registered CommenterProkash Sinha | Comments Off