Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Re: linux cluster



Darren Cook <darren@example.com> wrote,

> >[If] your main bottleneck is CPU <-> memory bandwidth. This
> >implies two things:
> >
> >* Buy the biggest cache you can get
> 
> Pentium-II seems limited to 512Kb cache. The next Intel chip is supposed to
> have larger cache (and be better designed for parallel processing), but it
> is going to be *much* more expensive. 
> 
> >(and write your
> >  application cache-friendly)
> 
> What does this mean? If it's answered in the great link (
> http://www.mcs.anl.gov/dbpp/ ) you gave yesterday, ignore this question, as
> I'm working through that site. :-)

This is, I think, not covered in this book.  Simplifying the
problem vastly, there are three things you have to take care
of:

* Memory access should largely be consecutive, ie, if you
  process element a[i] of an array, try to process a[i+1] in 
  the next step (or a[i-1]).  

  This is because caches are maintained in chunks (called
  cache lines).  So, if you access a[i], the processor will
  load the whole cache line, which contains a[i].  Access to
  neighboring elements will, therefore, hit the cache (ie,
  access the same cache line), which is about an order of
  magnitude faster than main memory access.

* If you have pseudo-random access to some data structure,
  adapt the boundaries of your inner loops such that the
  processed data fits at least into the 2nd-level (better
  the 1st-level) cache.  (This may require introducing an
  extra loop-level only to keep the cache happy.)

* If there are multiple traversals over the same arrays and
  the arrays don't fit into the cache, try to introduce an
  extra loop nest, which processes the arrays in slices
  that fit into the cache (first do all traversals for the
  first slice, then for the second, and so forth.)

There are other problems, like whether your loop fits into
the instruction cache and the strategies of different cache
architectures.

Another nice book is ``Parallel Computer Architecture: A
Hardware/Software Approach'':

  http://http.cs.berkeley.edu/~culler/book.alpha/index.html

Unfortunately, the contents is not online.  Chapter 3
(called ``Programming For Performance'') deals with the
memory hierarchy in general.  IMHO, the rest of the book is
also worth reading if you are thinking about doing parallel
computing.

> >* don't use two-processor boards.
> >
> >The two processor will only end up competing for bus access,
> >instead of doing your float calculations.
> 
> I also read somewhere that even if the data set is small enough to fit in
> the processor cache, on a dual processor machine they have to spend time
> comparing the two caches to make sure the processors are not altering the
> same memory. Single processors in a cluster won't have this problem.

Yes, cache consistency is another problem in multi-processor
machines.  Modern processors try minimize the overhead by
sophisticated processor design.  But, I am not sure how
successful this is.

Cheers,

Manuel
---------------------------------------------------------------
Next Nomikai: 20 November, 19:30 Tengu TokyoEkiMae 03-3275-3691
Next Meeting: 12 December, 12:30 Tokyo Station Yaesu central gate
---------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links