11 October, 2004

Maybe a million are too many

I'm trying to perform a statistical analysis of a result. This requires me to generate random monomials, then test them according to various criteria, then determine the rate at which they apply. I had the program loop to generate one million sets of monomials for sets of size 3, 4, 5... with 3, 4, 5, or 6 variables.

I have a 933MHz dual-CPU Pentium IV processor that ought to be able to handle taxing calculations without the user interface dragging (one CPU spends its time on the math; the other is available for user input). The latest versions of Maple must recognize the dual CPUs, however, and try to use both of them, because after a few hours the machine began to drag.

When I mean drag, I mean drag: it took several seconds to log in at a text terminal. The X terminal was simply unuseable. I checked the memory usage and the processor usage: top claimed that about 1/3rd of my 1GB was still free, and that process mserver (Maple) was only using about 40% of the processor. That would be 40% of one processor. So why is my computer dragging so badly?!?

I killed Maple, satisfying myself with the temporary results, and tried to loop over 100 sets of monomials instead of 1000000. A few minutes later, the program was done. A few minutes later?!? I was sure there must have been some error, but I checked the output, and no: there were results, and the results looked quite reasonable.

Maybe a million are too many. I have no idea why, but I think it might have to do with Maple's garbage collection kicking in whenever it thinks it's used too much memory, which might take quite a while when generating 1,000,000 monomials.

In any case, now I'm trying 1,000. - Er, I was. I just checked, and it's done. That took less than 20 minutes. I think it took less than 10 minutes, actually, because I started it just before I started writing this blog at 11.50am, and the last write was at 11.54am.

Weird. On to 10,000, I guess.

(An hour and a half later: 10,000 is taking longer.)


Anonymous said...

Dear Jack,

If it's any consolation, I noticed the same sort of thing when I was doing iterations of multiple logistical difference equations to model the first 550 million years of the Phanerozoic. (I didn't start that high, but even a few hundred thousand seemed to drag processors to a halt. So when I lowered either the iterations or the repetitions (of the iterative cycle), I found things picked up. I ended up having to do "samples" at intervals using approximations of diversity. Then I just decided to find myself a free machine and let it run six and a half weeks to generate and analyze the number sequence. Imagine my horror when I realized that I had missed the graphing function and had to go back and recalculate the Lyapunov exponents for the multiple series.

In other words, I sympathize.



jack perry said...

What were you modeling from that era? You wrote something about biological diversity, but the details weren't clear to me.

What system were you using? I mean the programming system (FORTRAN/ C /C++ /ObjC /Pascal /Modula /Delphi /Eiffel /LISP /Maple /Matlab /etc.), not the particular HW/OS combination.

Steven said...

Dear Jack,

I was modeling the phanerozoioc diversity pattern first noted by Raup and Sepkoski to suggest a 55 million year cycle of meteorite impacts. I was looking that their diversity data and the peaks they screened out as "white noise" and seeing if a chaotic model of biological diversity could propose an intrinsic rather than an extrinsic cause for extinctions. (It can, and in my opinion does--basically after a mass extinction there is a refractory period of rediversification which is very short--much more in line with intrinsic causes rather than extrinsic. But the debate rages.)

My recollection (as this was some time ago) was that I wrote the program in faulty pascal and had it translated (by one who knows better) to C++, At the time I was running it on a mainframe, but later with peppier computers and coprocessors reran the study on PC and observed some of the anomalies you note.

Your original comment isn't here for me to look at so I hope I've answered your main questions.



Anonymous said...

Dear Jack,

I responded earlier, but blogger either didn't accept or vanquished.

Basically, I was running different versions of ecological population variation equations to attain a rough picture of population variations as representative of taxon levels through phanerozoic history. The idea was to model phanerozoic diversity patterns to see if extinction patterns could have an internal (non-linear dynamical) cause or were necessarily driven by external (periodic impact) causes. This was basically to question the then-current notion of a "periodicity (anywhere from 35-55 million year) to extinctions." In the original analysis of the taxon data true peaks, were, it seemed to me filtered out as noise when they did not coincide with where the predicted peaks should be.

The nonlinear model worked better because of the shortened refractory period before recovery (hence the noted post-extinction "explosion" into niches.) Also it refuted Gould's need for the "paradox of the third tier." I was running the equations to check fit against observed diversity patterns and to see whether or not the patters could be brownian multi-fractals (as indeed, it turns out, they fit the pattern well.)

Anyway this involved running iterative equations and calculating the lyapunov exponents to calculate when the orbits escaped the attractors.

TMI, I know. But I was working in pascal (the only programming language I knew/know). I had a friend tranlate it into C++ and ran it on a mainframe at first and then ran it on PC when coprocessors had gotten up to speed.

It was a while back, but I observed similar behavior.



Gordon said...

I assume those are actually Pentium IIIs, since I don't think Intel made a 933 MHz P4.

Alessandra said...

If you want, I'll let you use my Cray...


jack perry said...

Gordon: I think you're right that it's a P3. Since you mention it, I think I was discouraged from getting a P4 because of cache issues at the time, or something. Or maybe it was a Xeon they discouraged me from using. Either way, what I really wanted was a Dual G4. :-)

Alessandra: believe it or not, a Cray probably won't help. The problem appears to be Maple, not the machine. Every machine, no matter how good, will be brought to its needs by a badly-programmed algorithm.

As proof, I submit Microsoft Windows, and the first version of OSX. :-)