Dual Core technology
After the massive acceleration of Intel's dual-core processor plans, dual core has arrived. And it's set to make a huge impact. The processor previously codenamed Smithfield has become two parts: the Pentium D and the higher-end Pentium Extreme Edition (EE). One of the first dual-core PCs in the country was hand-delivered to us by Intel.
Inside the machine was a Pentium EE processor, with two independent execution cores each running at 3.2GHz. Note that the name is 'Pentium Extreme' and not 'Pentium 4 Extreme'. Intel has dropped the Pentium 4 name for its dual-core CPUs, although it seems likely it will be retained for its future single-core products. (see The roadmap, below).
What's the difference?
Whereas the original EE processors were essentially repackaged Xeons, the new EE is a direct variant of the Pentium D. Both have 1MB of Level 2 cache per core, an 800MHz front side bus, plus EM64T and Execute Disable Bit support. The only significant difference is that the Extreme has Hyper-Threading on both its cores, which the D lacks. The EE is also clock multiplier unlocked to allow for overclocking.
The new parts come in the same LGA775 package as current Pentium 4s, but dreams of being able to pop one into an existing motherboard are misplaced. Dual-core designs won't work in current 915 and 925X Express chipset motherboards; the system will immediately shut itself down. You'll need a new motherboard with one of the two new chipsets: 945 and 955X. Our test system was supplied with a 955X-equipped motherboard in the form of an Intel D955XBK. Technical details were still sketchy at the time of going to press, but it seems clear that they're relatively minor tweaks of the existing 915 and 925 chipsets.
For the time being, yes. Intel itself makes little secret of the fact that the basis of the first dual-core processors is the standard Prescott core, with some extra arbitration logic to allow for such things as power management. With two to cool via one heatsink, they can't run too fast.
Despite Intel's enthusiasm, Hyper-Threading - where a single-core processor allows two threads to share execution resources simultaneously - has been only a qualified success. The problem for developers has been the relatively opaque low-level mechanisms by which the CPU handles threaded code. It's difficult to determine, ahead of time, if the performance boost will be worth the effort or if resource contention will hamstring the process. Dual core changes the landscape. Developers can look at their code at the algorithmic level, determine which operations can usefully be executed in parallel and get a very good idea of the benefits before they start coding. Since dual-core processors are basically two completely separate CPUs in one physical package, you can be practically guaranteed that two independent operations taking ten seconds each will take ten seconds on a dual-core design but 20 seconds on a single core. True dual core is a performance no-brainer, so long as the application in question is algorithmically suited to multithreading.
Don't Forget your Thermals
Contrary to various rumours circulating before their release, the new dual-core parts aren't especially power efficient: they consume less power per core than current top-end Pentium 4s, but only by dint of their relatively low clock speeds. The TDP (thermal design power) of both the Pentium D and Pentium EE is 130W - higher than any previous desktop part. That said, the new parts do feature EIST (Enhanced Intel SpeedStep Technology), reducing frequency and voltage when possible under operating system control. This means that for desktop applications, the power consumption should on average be much lower than TDP. The heatsink supplied with our EE processor was the same design supplied with current single-core parts and encouragingly didn't get noticeably hot in operation.
How does dual core relate to multithreading?
Up until now, mainstream CPUs have been serial-processing devices. Increases in performance have come through, upping the speed at which the processor does a single operation, and it does these one at a time.
Dual core marks a sea change in that approach; as of this very month, mainstream computing has gone parallel. Rather than completing a given operation faster to enable it to get through more operations one after the other, parallel processors do two things at once.
In fact, at the very low level of CPU architecture, parallelism has existed for years, but this involves breaking code down piecemeal at runtime for faster execution while it's going through the processor. This approach can only go so far - the CPU isn't intelligent and can't make parallel optimisations that look ahead further than the next few blocks of instructions.
With multithreaded applications executed on dual-core processors, everything is different. Instead of the software handing the CPU one task at a time and hoping it can make do, the onus is on the human programmer. The CPU expects to be handed two clearly defined tasks at a time; if it's not, then half of the processing power on offer is wasted as one of the two cores sits idle.
Makes dual and multi-core processing a double-edged sword. It can deliver huge performance gains but only if two conditions are satisfied: first, that the problem in hand is suitable for splitting in two at the conceptual level and, second, that the programmer has sufficient skill to handle that task, writing software to take advantage of the two processors on offer. Writing multithreaded applications is a far more difficult task than would first appear, primarily because humans naturally think in a consecutive way, doing one thing after another. This major hurdle is the reason Intel is spending millions of dollars to re-educate developers, placing literally thousands of its own engineers in the development workshops of major software houses like Oracle and SAP in order to shepherd its developers and ease the intellectual burden of developing multithreaded applications.
The perfect demonstration of both the benefits and disadvantages of multithreading and dual core comes from a test render using discreet's 3ds max 7. Remember that rendering is purely a CPU and memory-bandwidth test, on which 3D graphics performance has no bearing. Now look at the results compared to Intel's fastest single-core processor, the 3.73GHz Pentium 4 EE, directly swapped into the test setup. You can see that with 3ds max's multithreaded rendering enabled, the dual-core processor obliterates the single core with performance nearly twice as fast. But turn off the multithreading and the render is performed in just the one core of the dual-core part. Its time is more than doubled and the single-core 3.73GHz CPU beats it. If the developers of 3ds max hadn't bothered to make their application properly multithreaded, even a four- or eight-core processor would be beaten by the faster-clocked single core.
Note that the single-core part gets some benefit from Hyper-Threading when 3ds max's multithreading is enabled, but nowhere near the boost that a true second core gives you.
The reason Intel will continue to produce new high clock speed, single-core parts is shown by the results from our application benchmarks (see left). These are based on real-world applications, which haven't been designed from the ground up with parallel execution in mind or perform tasks that simply don't lend themselves to threading. As a result, the scores are well below that of the higher-clocked single-core EE part overall.
So is dual core worth it?
Absolutely, unequivocally, yes. We don't want to get too evangelical about it, but the advent of true multiprocessing on the desktop - and Intel's explicit acknowledgement that it has processors in the works with more than two cores (its engineers have mentioned 16- and 32-way multi-core at press conferences) - is a massive leap forward for computing. The reason it hasn't happened before is down to understandable reticence in the face of the budget sheets: the massive cost and problems of changing the way developers work. But Intel has seen the writing on the wall and realised it must bite the bullet and bear the cost in order that computing power can increase in the long term. Once the big technical challenges to software engineering have been overcome, the door opens to almost unlimited increases in computing performance.
The Intel processor roadmap is almost entirely, but not completely, devoted to dual-core and multi-core processors. Variants of the same cores will be used in new processors with both single-, dual- and possibly multi-core versions. Thus, the EE and first Pentium D processor cores are basically 5xx-series Pentium 4s, based on a 90nm fabrication process. The next generation of parts will be based on the core now known as Cedar Mill. This will be the first based on a 65nm fabrication process, with 2MB of Level 2 cache. A pair of these basic cores will form the next generation of desktop dual-core processors known as Presler, while a single-core variant of Cedar Mill will be released, probably under the Pentium 4 name and almost certainly clocked higher than the dual-core part.
The pitfalls of multithreading
Some computing tasks lend themselves extremely well to multithreading, and it's easy to speed them up with this approach. A good example is the common need to generate thumbnail images from a folder full of JPEG photos.
The application programmer can multithread this process so that, for each image in the folder, a separate independent thread of execution launches to load, decode and eventually display the image in the preview pane.
Up to a point, the more of these threads you have running in parallel, the quicker all the images will be displayed. Multithreading works well here, because the loading, decoding and display of each image aren't dependent on the loading, decoding and display of any other. But now think about what happens if our programmer wants to go to town with their multithreading. If they want to multithread the subprocess of reading the image file from disk and doing the fairly complex calculations needed to display that image onscreen, they run into problems. You can't simply start two threads, one to read an image from disk and one to decode it. The thread dedicated to decoding calculations will sit around and not be able to do anything until the thread dedicated to reading the image has made the data ready and can pass that data along. There's no benefit to parallelising the operation in this way, and in fact it may be slower than doing everything serially because of thread-tracking overheads.
This essential problem of inherent 'serialness' pervades many, many computing tasks and it's the reason that the benefits of multithreading aren't a simple plug-and-play answer to computing performance.
David Fearon PC Pro