next up previous index
Next: Working with the AVIDD Up: Supercomputers and Clusters Previous: HPC and The Grid

To Program Or Not To Program

To program or not to program?

The answer is, not to program, if you don't have to. Programming is a tedious activity and, unless you are a professional programmer and make your living doing this, there is nothing for you to gain from such an exercise. Because of the tedium involved and because of the enormous amount of labour required to write even simple applications, programming jobs, like blue-collar jobs, end up going overseas to cheap labour countries, most often to India  nowadays. So, even if you have been thinking about becoming a professional programmer, think again and revise your options.

If you can solve your research problems by running an off-the-shelf (commercial or free) application on your laptop or desktop PC, do so. Focus on your research mission, not on computing. The latter is only a tool and it is only one of many you will have to use in your research.

If your laptop or your desktop PC don't have enough power to solve your computational problem, use your laboratory or your departmental server. A recent review of how US researchers use computers showed that their laboratory and departmental servers are by far the most important systems in their work.

If your laboratory server is not powerful enough to solve your computational problem, use central computing facilities provided by your university, or, if your university doesn't provide such, use systems provided by the national supercomputer centers. This is what they are for.

In all cases try to use off-the-shelf software if available. Only if your problem is so exotic that there is nothing out there in the software world to help you out, you may really have to sit down and write your own program. But before you do, think again. Can your problem be solved by laboratory experimentation? Think of Nature and your laboratory bench as a computer that is much faster and much more accurate than any man-made computer. Can you solve the problem analytically, without having to resort to numerics? Laboratory and analytical results are always going to be more convincing than numerical simulations.

Most research domains have developed various codes for reduction of experimental data in their related disciplines. For example, astronomers have the IRAF  package that they can all use to process astronomical images. High energy physicists have numerous applications for data processing developed at and distributed by CERN . Geneticists have developed numerous codes related to their work too. There are commercial engineering codes for just about everything that engineers work on nowadays. There are codes for car crash simulations, codes for designing and testing designs of integrated circuits, codes for structural engineering, water engineering and what not.

But let us go back to the case mentioned above. You're stuck, you have to carry out some complex numerical computation, there is no off-the-shelf code that you can use (you have checked, haven't you?), there is no laboratory procedure you can resort to in order to attack the problem and the problem is analytically intractable (you asked your friend, who is a mathematician and she told you so; never mind you weren't nice to her when she was your  girlfriend$\ldots$). So, what are you to do?

The first thing you should try is to use one of those so-called problem solving environments like Matlab  or Mathematica . They are designed to minimize programming effort and to maximize problem solving efficiency. With these environments you can probably attack any problem that can be solved on your own laptop, desktop or laboratory server.

If this doesn't do, if the computation is going to be truly massive, or if the computation is of the data base variety (Matlab and Mathematica don't do data bases), only then look towards SMPs or clusters. But look towards SMPs first. They are easier to use. They are fabulous data base servers. You can run data parallel programs on them.

Alternatively if you can attack  your problem by ``capacity computing'', i.e., by running a lot of relatively small jobs on separate machines, use a cluster. A lot of stuff is tractable by ``capacity computing''. High energy physics resorts to ``capacity computing'' for most of its data reduction procedures. SETI , Search for Extra Terrestrial Intelligence, is another example. Car crash analysis : if you have to simulate car crashes under every possible angle, distribute the jobs over a cluster.

If neither of the above applies, an SMP is not powerful enough, your problem cannot be attacked by ``capacity computing'', well, how about getting an account on the ARSC's Cray X1  and trying to solve the problem by data parallel means on that machine?

Do I hear it right? Are you saying that your problem is huge and that it is not data parallel and that it cannot be solved by laboratory experimentation and that it is analytically intractable and that there is no SMP around powerful enough, and that it cannot be tackled by ``capacity computing'' either?

Let me ask this: ``Is there enough money in it to bother?'' And if there is, are you going to get any of it?

To be frank, the only problem of this type that I know of is data mining and parallel data bases, i.e., operations on gigantic data bases that are so large that they just can't be squeezed into a single SMP, however large.

Most computational problems that currently run on clusters deployed at the National Supercomputer Centers would probably run better on systems such as Cray X1 or the Earth  Simulator. But parallel data base problems and data mining problems would not.

So, this is a yet another rationale  for this course, ``High Performance Data Management and Processing''.

next up previous index
Next: Working with the AVIDD Up: Supercomputers and Clusters Previous: HPC and The Grid
Zdzislaw Meglicki