Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Re: multi-processor linux configuration ?



Darren Cook <darren@example.com> wrote,

[...]
> One interesting thing I read - the more machines you have, the more often
> you'll seem to have machine failures. So make sure the machines are easy to
> replace, and that your application can adapt to machines coming and going.

Yepp, the failures are a problem, and therefore, more
traditional supercomputers use check pointing a lot.  But I
don't think that it is usually not necessary to write the
application such that it can adapt to coming and going PEs
(processor elements).  If your application is not extremely
long running, the chances for a failure in the middle of a
run are still rather low.  And in the occassional case of a
failure, you can always restart.  On the other hand, to make 
an application failure safe can cause a lot of overhead,
which slows down all runs (not only the occasional one that
actually fails).  Still, check pointing is a good idea if
possible.

Manuel

---------------------------------------------------------------
Next Nomikai: 20 November, 19:30 Tengu TokyoEkiMae 03-3275-3691
Next Meeting: 12 December, 12:30 Tokyo Station Yaesu central gate
---------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links