Measuring badness of a Program - II
In part one, I focused mainly on organization of files and functions. Here I will try to point out some of the finer aspects ...
Global variables are something it watch for. When you go to system programming space, you will find its danger. As an example, reentrant code should never have use of global variables. You might wonder why is this?. As I found time and again, unless I start to think about something and tryout different possiblities I never learn. It's okay to make mistake, but not learing from it is not okay. Any serious programmer should think about not using globals. If there are lots of globals in a program it not only makes the functions non-reentrant but also it makes the program incomprehensbile. I've seen it way too many times, and I hate any designer who does that.
In any function, if there are mulitple exit points I grade it as bad code. Well there are some nicely written code that has more than one exit point but not too many exit points in a function. Usually return statement from all over a function is a bad idea at its best days.
Naming variables and other namable objects should be thoughtful. Giving short names are fine as long as it is used in an obvious way. For example variable i,j,k are mostly indication of indices which follows the convention from the fortran or earlier days. But if those variables are used to hold a load of intermediate and or final results of the computations then it is another badness of a program.
If a switch statement has more than 10 to 12 switch points, it is a bad programming style. Refactoring should group some of these.
Implicit casting when results from one types of data is implicitly casted to another type by compiler loss of information is possible. Usually compilers have flags to catch these sort of badness.
Too long of a name is always a bad practice. What is too long? I don't name variables more than 15 to 18 chars when I'm in right mind. Usally 8 to 10 char serves my tastes. Function and methods can have longer names, but if it goes beyond 40 to 50 chars I tend to get a stomach ache.
Pointers or any sort of indirections of 3 or more steps makes me stop for a while. Usually, a pointer to pointer is sufficient in most cases. Try writing some moderate amount of logic using pointer to pointer to pointer and see for yourself. Remember the next person to own your code for maintenance or enhancement might not comprehend easily. And when you are not there, to solve a debugging problem could take a bit longer than usual when such pointers with 3 or more depths are used. Once I had that problem myself, and some other people started calling me point dexter :-).
Interesting comments are for example: If we are here, screw the user, and all sorts of other comments that are personal frustration and / or anger. Frankly there is no need for any of these. It's only a matter of time when the owner would be long gone from the projects, and those littered comments would still be around.
Measuring badness of a program - I
Remember that we are talking about code. Not a running program. So what are the badnesses in a program code organization?.
I would assume a language that allows multifile compilation of a program. Since lot of codes are in C/C++ I will try to stress some of the badness of C/C++ program organization.
Macros are not debug friendly, and I've never seen a commercial progam that was not under debugger to fix a bug or two or more. Mostly macros are little beast in the sense that some repeatative computations needed to be used in several places in a program and yet they are not bonafied candidate to become functions. I would use any means to get rid of macros that are not simple, so that they are visible under debugger.
Constant defines are also very clumsy. Use all means to get them to be visible under debugger.
interchanging the purpose of file types. Usally header files are only for headers, and not for implementations of functions or routines. Similarly code files should have mostly codes. If a code file contains lot of defines, macros etc., they should have specific meaning to the file scope. If any of these are also found in any other header or source file, a potential conflict will come forth sooner or later.
Header files and source files should follow consistent folder (container ) structure. As an example folder named pgm could have a subfolder named inc and another subfolder named src. Now these two sub-tree should have as much similarities as possible. It is a simple indication that the original owner cares for other potential maintainers.
Usually any commercial program need qute a few helper functions. They should be group into files in a logical manner. As an example, if I have generic implementations of tree sturctures and its operations, they should not be in mutiple files, and they should not be in any program domain related files. A program domain file is where the domain knowledge of the program is implemented partly or fully.
Sad fact is that I've seen many of these badness even from really accomplished programmer.
Measuring goodness of a Program - II
Now that I've gone thru first part of it. In my second part, I will try to explain my thoughts about goodness of a part of a program. Any high level language will have some constructs about repetitive work ( usually called loop) and some decision constructs ( usually if-then-else). Other constructs are mostly immaterial for this discussion ...
For decision constructs I particularly look at the depth of the nesting, as well as composition of logics. For example, in a C or C++ construct, if I see the nesting of ifs are 4 or more level deep then I don't think it is a good decision logic. The reason for this is that if I draw a tree with all the branches it soon becomes too complex. Try yourself, and you will see what I mean :-)
For composing too many clause within one if statement is also not very good. For example I would not consider the following if statement a good one even if the variable and constants names are very meaning full ( NoNameUno :-)
Bad if statement -
if ( X && ( !Y) && Z || Y && W )
I would not go into detail of this. If you see in a program not written by you, and you don't know much about the program, you might find it not being easy to comprehend. This requires refactoring.
For loops, first and foremost they should at most be 3 level deep. In other words, if the loops are nested more than 3 levels, it could become difficult to understand inside logic within the loop. There might be situation when inside logic is very minimal, but 4 or more levels of loops are needed. Only for those specific situations nesting at 4 or more levels can be cosidered good.
If a function has both iterations and recursion, then either that function is very elegant or very complicated. If it is hard to comprehend, it should be considered bad implementation.
Commenting functions, Naming variables, indentations etc. are mostly personal choices. Different places have different rules. Most of them are to make the code more readable/understandable for others, and often it helps. It is possible to follow these rules and still come up with very convoluted implementation of some logic. I personally try to see if something is really convoluted. How do I find it is convoluted? Just read some random function of a random source and try to understand what that function does. Then find out what are not known to you (perhaps some other function it calls or other API it uses), if there are too many of those that were not explained in a comment line or such, then you know the function is not comprehensible.
Measuring goodness of a program. - I
The industry measures programs from different perspectives. Mainly a software design place would look at how best the specification and design have been implemented or realized, a quality assurance place would try to break the promise of a program using various means. And if a program is of any significant market impact, standard bodies, reviewers, analysts, and others will test its guts out... They all seem to have a predefined matric to measure against: Correctness, performance etc., etc.
But here I'm interested from a purely programming point of view. There is a lot of programs that needs to be designed / maintained / customized etc., etc. So what are the basic things to look at to measure the goodness of a program ? Here the matric is not easily measureable, and that is what interests me most mainly because that there is a 90% or more probablity that one would end up in a project where most of the code/program already exists and (s)he need to catch up with the rest.
There are lot of methods being followed in different places. Some are manual and code walk thru, and there are others who has softwares ( yet another set of program(s) ) that verifies and try to find basic flaws in an implementations. And people often go at length to find defects in algorithms used in a program. Here our emphasis should be little different. And the reason for this is that hardly any developer's program runs in isolation. They usually work with other people's implementation of other modules. And mostly developers moves from one project to another quite frequently.
Assuming high level language used for a typical implementation, I always look for the following -
1) How easy would be to replace some of its parts?. Here parts could be sub routine, block of code in a sub routine, a macro etc. The reason for this is that people tend to think in term of parts ( divide and conquor ) when it comes to repare or understand anything.
2) How well the comments explains about the implementation. Here well does not mean number of lines in the comment. And if too many things are done in a routine, even the implementor would not be able to comprehend it within the comments.
3) How many lines of codes are there in the largest subroutine?. Here if a sub routine goes more than 50 lines it should not be considered as a good subroutine. More like 30 lines should be the max, except in few cases where 50 should be maximum. Change these numbers as you please, but don't go to hundreds :-)
4) How many things a function is trying to do?. If a function does too many things, it would not be a good function for maintenance.
Once I get a feel about a program using above stated four steps, I try to go into the detail specific to the language used, and overall quality of a program. But they are for other topics.
What is programming then ?
The answer depends on who is asking this question. There are few other trades where people often calll their work as programming. And there are all kinds of people who even does not know what is computer programming. BTW, here I'm only talking about computer programming. And this blog is for people who are actively involved in computer programming, so if you happen to be here and not interested in this, please take a quick exit !.
The assumption for any further blog is that the audience knows some programming environment that uses systems like Windows, Linux, Unix, Mac OS, Symbian, Palm, and your favorite mini/micro/main/embedded program development environments.
In my opinion, Programming is a realization of an algorithim. And an algorithm, recursively defined, is an algorithm. No seriousness warranted :-)
At a very basic level, programming is to write a program to do repeatative computation. Computation at a very basic level is to take representable and measureable input of some kind and observe some output. At a logical level the computation can be thought of arithmatice nature. If these huge computations of modern days could be done by programs using computers, we should be able have lesser number of working hours, and more quality time. Wasn't it the big daddy of all little promises of high tech?
An algorithm is a finite combination of computations of above nature. We use algorithm for almost any type of human work, be it making food, be it making shoe or be it going to places. There are a wide variety of formal and informal definitions of programming, algorithm, and computations. From a programming perspective these above definitions should certainly help.
Since we almost always use higher-level computer languages to craft a program, abstractions from the above definitions are necessary. In that abstract model we think about an algorithm as a piece of code that does some computations. A piece of code is really some finite number of high-level languge statements.
Finally to me, programming and programming model are cousines. There are two primary models: Higher-level language model and Machine mnemonics model. Mnemonics models are also called assembly language programming. If you split hair, I'm sure, you will find more models :-)