Sat 22 Nov 2008

RSS Feed

Edited by Paul Hales

Published by Incisive Media Investments Ltd.

Terms and Conditions of use.

To advertise in Europe e-mail here

To advertise in Asia email here.

To advertise in North America email here.

Join the INQbot Mail List for a weekly guide to our news stories:

Subscribe

Twenty-four-core Dunnington system benchmarked in Windows

Exclusive Benchmarketing Lady Sandra goes down to Dunnington town

INTEL'S Jones Farm facility in Oregon hosts the performance benchmarking labs for the cream of the crop of Intel system platforms. Some of them, like the brand new six-core times four-socket Dunnington Xeon 7400 series were lurking in a humongous black box for us to play with during IDF.

As we already know, each six-core, near 2B trannie Dunnington CPU contains 3 x 3MB L2 cache (a set per two cores) plus 16MB shared L3 cache that also handles all the talk between the core-pairs, inclusive the substantial cache coherency overhead. That will explains some of the benchmark numbers down the page.

The FSB is still only 1066 QDR, and each CPU has its own link to the Caneland chipset. By the way, IBM's Dunnington chipset variety is supposedly faster due to twice the memory bandwidth and lower latency by using 8 channels of DDR2-533 standard memory instead of four FB-DIMM channels.

Architecturally, it can be an interesting test exercise due to the high number of cores and oh so many cache levels on top of a huge 64 MB snoop filter in the chipset, not to mention the memory. Aimed at the enterprise suits rather than the HPC crowd, Dunnington cares more about the overall throughput than the latencies.

We ran some tests and, with kind help of our Jones Farm friends, another round on the final stuff. How do Windoze PC benchmark results look on a humongous 24-core enterprise server behemoth? Put aside the notion that no form of OS with a name starting with 'Win-' should ever be run on such a serious system.

Well, Sandra 2009 looks good for that: plenty of graphically-rich test runs with a choice of display options. So, here are some exclusive numbers for our esteemed readers to drool over

CPU int & fp: 207394 MIPS, 289955 MFLOPS

Multimedia int & fp kpixels/s: 737141 int, 495045 float

Multi Core efficiency

Memory bandwidth: 3519 MB/s int, 3520 MB/s fp

Memory latency

Cache and memory overall

Interesting massive rise in L3 cache latency compared to the L2? 104 cycles vs, what, 16? One would say it's darn slow, as if it's, say, eDRAM cache rather than SRAM. We asked one of Intel's top server guys about this and the answer was that it's still the usual SRAM cache cell, but the cache coherency overhead for all these cores, handled by that L3 cache, imposes its unavoidable penalty.

Well, OK, still better than going through FSB or memory, as you can see from the further four-fold latency jump when doing that. And that memory bandwidth number looks a bit low compared to the Opterons, but then the humongous caches cover quite a bit of that data traffic here.

For some reason, the multi-core traffic as measured by Sandra 2009 is somewhat slower than on the old 65nm Clovertown Xeon with the four-core, dual-die shared FSB. The graph curve stays in the similar shape though, which is notable.

Aside from that, it was interesting to see the first Windows platform reaching near 300 BIPS and 200 GFLOPs of raw power in a single system. And that multimedia" FP Mandelbrot test? Well, this is the first system to come close to top GPGPUs on this, if you recall our earlier Sandra GPGPU test.

What's next? We won't see such a multi-core system again on the Intel front until two things happen: the possibly 6-core Westmere 32-nm Nehalem shrink, late next year in UP and DP systems and, at the same time, the appearence of 8-core "Beckton" MP version of Nehalem for large servers. This one, with all eight cores aided by between 16 and 24MB of L3 cache on a single die, fed by four channels of memory per CPU, will be the true successor to Dunnington.

But then, these should perform better per-core, shouldn't they? µ

Comments

Bench it against the real stuff.

Hi,

Better start benching against this Tyan board + M4985 module:

http://www.tyan.com/product_board_detail.aspx?pid=271

That will allow you to put 8 Quad-core AMD's in the system.
Bundled with NUMA and see how the Intel does now.
It's silly to bench such against entry-level systems.
Shame on you!
posted by : Bas, 01 October 2008

Nebojsa is Happy Faced.

Only comment seemed skewed was last. isn't your description & graphics presentation showing far, up to 10X, better processor? For slower memory, it Kicks.

Dunnington is becoming Sales item right now. Creative writing gives smile for use of Trannies. All 1.9 Billion of them in Dunnington. Das lot o' Potential.

Next: Halloween Candies, Your Choice. Choice Uno: 7 on Thanksgiving'9.Retailers Prayer.Full Factory Warrentee. Intels Prayer: NT6 is Dunnington Wiz. My Prayer:Naj Hand Falls Off. I personally type with my Nose.My Very Long Nose.My Growing, Very Long, Nose. <:--)
drashek
posted by : Ultee', 01 October 2008

Crysis

What no Crysis benchmarks? What a waste! :)
posted by : Jim, 01 October 2008

What about some real-world benchmarks ?

Like, how many FPS in Crysis at 1600x1200 with full details ?

That's the only benchmark that matters.
posted by : Pascal Monett, 03 October 2008

*the sun*

Do the tests as compared to ultraSPARC IIIs and RISC-optimised OSs.
posted by : integrates like spider shoes, 05 October 2008
IThound
Search for solutions, reports & analysis

Newsletter signup



 

Top INQ Stories