« Monkeys on Trees ! | Main | It's parshing county, Nevada. Wild wild west »

parshing county, Nevada - continues

So what is parsing ???.

Defintion - A parsing function is a function that tries to parse thru a bit stream. This is a recursive definition!

At it's basic tenate, it is a step to take a stream and try to interpret. So for example, you just got a buffer full of bits, and one program (i.e. a parsing function) says, well it is a bit stream or byte stream or char stream. These are almost always NULL parser(s). Some has underlying, albiet very small, assumption, and some don't have any at all!. Stop for a moment, and see which has those assumptions!! What I mean here? Well, at its lowest level a bit is an identifiable unit. But for our purpose, it is more interesting to interpret as a part of a larger thing. For example, a nibble, a byte, a char, a number. So there are those assumptions. These may not be interesting while we talk about parsing, but is another fundamental assumption that we need to understand. Also getting bigger picture, almost always gives us a step foward to our understanding of things. For example, a parser usually deals with a token! So what is a token? It definitely has some assumption, and I wlll leave it at that... But the main point is  alphabet / separator / terminator etc. That I will discuss soon.

Now let say, given a stream of chars, a parsing function parses to say that it is a sentence for english language. And yet another took a stream of chars, and says it is a valid arithmetic expression. What is common here is an underlying rule that the function is trying to recognize. The rule is an english grammer in the first case, and an Algebra rule in the second case.

If you ask "Is there a possiblity, that a given stream can be interpreted as two different rules ... ?", then you will be right into the theory behind parsing - Autometa Theory. But we are going there, not yet at least...

Definition of a Parser - A parser is a program that implement a rule. A NULL parser don't implement any rule. A rule is usually a specification to recognize something like: a valid ip address; a valid statement of a computer language etc.

When a specification for a rule is used, it is usually thru a wide variety of techniques. One such powerful and robust technique is regular expression. A regular expression recogmize a regular language over a defined set of alphabet.  For the definition of alphabet, and regular langagues, please see wiki.

 As a very simple example, assume that we need to parse a string buffer and find out if it is a valid ip address or not. What is a valid ip address, you may ask!... A valid ip address is in the form of nnn.nnn.nnn.nnn in ipv4 version, where n is any number from the set {0, 1, ... 9}. A leading zero in any of those four parts, separated by dot (.) character could be allowed for simplicity. So could be a valid ip address.


Now write a C program to see for yourself. This usually takes a bit of thought to program correctly ! I would try first, as if I never seen regular language and regular expression. Then we will go about definining a regular expression...


ip address  == [0..9][0..9][0..9].[0..9][0..9][0..9].[0..9][0..9][0..9].[0..9][0..9][0..9]


As you can see, leading zeros are allowed!. Now we can refine it if we want to. Then we will write program to test the specification and implementation is correct or not.


History - When regular expression, and parsing became a branch of science and widely popular, not many application of the technique were found beyond compiler and tools writer. Then came the internet and parsing became almost essential in the WWW technology. The result is that lot of languages now provide some kind of support to define regular expression using the underlying regular language, and parse information based on the definition of the expresssion. To name a few: Perl, Python, C++, C#, and Java.


More ...

Posted on Sunday, June 23, 2013 at 09:23AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
All HTML will be escaped. Hyperlinks will be created for URLs automatically.