Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] OT:dual core CPUs

On Mon, 16 Apr 2007, Darren Cook wrote:

The Athlon X2 3600 is 1.9Ghz, the Intel T7200 is 2Ghz, and the Intel
E6300 is 1.86Ghz. This may be the dumb question of the day, but doesn't
that make them slower than my Celeron 2.8Ghz (assuming I'm running a
single-threaded application)?

No, for many reasons. One of the big ones is that CPUs do varying amounts of work in one clock cycle.

Beyond the simple matter of a dual-core CPU executing two instructions
simultaneously, even single-core CPUs may do simultaneous execution of
some instructions, such as an integer and floating point operation at
the same time (these typically using separate circuitry on the chip),
or having two integer units and thus being able to execute two integer
instructions simultaneously.

Modern CPUs have pipelines that allow them to be reading in new
instructions while current ones are executing, to make the new ones
more quickly available when the time comes to execute them. Keeping
the pipline filled is tricky, especially when it comes to branches;
CPUs do branch prediction to try to determine which instruction is most
likely to be executed next (when this depends on the results of previous
instructions) and have it available immediately when the previous
instruction finishes. CPUs may look at the past history of instruction
execution for a particular program in order to make these decisions.
Many CPUs these days do out-of-order execution, executing later
instructions before earlier ones when they happen to be available first,
and predecitive execution, using otherwise spare capacity to execute
instructions for which the results are not yet confirmed to be needed.
Pipeline management can get extremely complex and sophisticated. Smart
compilers can manipulate instruction order and types of branches to make
a huge difference in execution speed.

Cache management is another issue; fetching an instruction or data from
internal registers is usually several times faster than fetching it from
L1 cache, and L1 cache is often an order of magnitude faster than L2
cache, which is in turn that much faster again than memory. Waiting to
get an instruction or data from RAM can waste tens of clock cycles, with
the machine effectively sitting idle for that time. Applications that
access a lot of data have to be very careful with their memory access
patterns in order to take maximum advantage of the cache; bad cache
access patterns can easily make an order-of-magnitude difference in
execution time for some applications.

These factors lead to situations such as my 1.7 GHz single-core Pentium
M 735 being faster, in general, than a similar generation 3 GHz
dual-core Pentium 4.

Curt Sampson       <>        +81 90 7737 2974

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links