Quad-Core
posted by Sudip Chahal on December 07, 2006
Hello! My name is Sudip Chahal and I am the Compute and Storage architect in Intel IT and this is my first blog entry.
Recently Intel launched the industry’s first high-volume quad-core processor based on the new Core 2 micro-architecture, code named Clovertown. Our team has been busy evaluating the performance of this new product in a virtualized environment – we are probably one of the fortunate few to have gotten our hands on an “advance copy” the top of the line 2.67/1333 system. Frankly, the entire team is very impressed at the scaling we are seeing relative to the Woodcrest based servers (in the 1.5-1.7x range on an average – more about our methodology later) for a variety of typical Business computing workloads (OLTP, DSS, Reporting services and the like). Over the last several weeks, it has become blindingly obvious to us that these processors pack a tremendous amount of computing power in a very cost-effective “footprint”.
As mentioned above, the scaling of this “Blackford” platform with the Clovertown processors is superior for virtualizing general purpose Business computing workloads. Combine that with socket based licensing from VMware for their best-selling ESX server and you have the foundation to enable a whole new level of virtualization in the enterprise. While virtualizing test and development systems and other “low-risk” systems is all well and fine, the price-performance of the new Clovertown quad-core systems should enable IT to take virtualization to an entirely new level. Many applications that may not have been good candidates for virtualization in the past due to performance concerns or concerns about high costs now all of a sudden become prime candidates for virtualization and materially help move IT towards the end-goal of an automated, modular and virtualized compute utility. While previously it may not have been cost-effective to “dedicate” a host and expensive virtualization software licenses to host a modest number of more demanding applications, that is now in the past. The new very cost-effective quad-core processor based servers (pricing is comparable to dual-core servers) combined with attractive socket based licensing policies means that virtualization for the masses is now an affordable reality.
If not already done so, ask your server supplier to loan you one of these systems to take for a test drive and let us know what you think. We in Intel IT are confident that you will be pleasantly surprised and will very possibly making the jump to quad-core server before the year is out!
Comments (29) (closed)
tagged: enterprise, quad core, server, virtualization, Xeon


Comments
Dec 08 | steven e. streight aka vaspers the grate said:
As a usability analyst, primarily for web apps, I’m curious how one might usability test a Quad Core processor…???
Dec 08 | Sudip Chahal said:
From a usability perspective, the primary difference you would see would be in better responsiveness under high-load “stress” conditions - one could certainly argue that that is not usability testing at all.
So, from an application functionality perspective, you will not see anything different. While these quad-core systems are awesome for what they do, they cannot improve the application usability other than the previously mentioned significantly improved responsiveness under high loads. Assuming that your web-based application is a runaway hit - these servers are exactly what you want in your corner!
Dec 08 | Chang Li said:
We just released Ladybug Player Vista 2 with quad channel video mixing and effects. Our Player has been targeted on the Digital Signage Applications. We have tested quad HD videos on Pentium D HyperThread processor, unfortunately the bottleneck is on CPU side rather on GPU side. The CPU usage was soon achieved to 100%.
It is interested to know if your Quad Core can get real-time quad channel video playback. Our software is suitable to test multi-core processors. Here is the demo download:
http://www.neatware.com/player/
Hope to hear good news from you.
Dec 09 | steven e. streight aka vaspers the grate said:
Okay, help me understand. I am mainly a writer, with a strong technical orientation. I beta test various software, like custom search engines, blog platforms, RSS feed scrapers, audio editors, podcast studios, etc.
So, from my web usability and desktop app perspective, would a quad core processor enable me to like:
…simultaneously?
How does Quad Core Processing help a PC Windows user multi-task?
I’m less interested in enterprise operations, at this particular time, and more interested in home users on PCs.
So, could I usability test a Quad Core Processor from that angle, run user observation tests in a lab environment, for such activity?
Dec 10 | Michael J. Freeman said:
How does this compare to Sun Microsystem’s cool threads technology? They have 32 cores on one chip and Intel only has 4?
Dec 10 | steven e. streight aka vaspers the grate said:
Or how does the Quad Core Processor enable users to do a Norton AV scan whilst doing anything else without computer freeze-ups and slow responses?
Dec 11 | Sudip Chahal said:
re: >> Okay, help me understand. I am mainly a writer, with a strong technical orientation. I beta test various software, like custom search engines, blog platforms, RSS feed scrapers, audio editors, podcast studios, etc.
Sorry for the delay in responding - I am still learning this system.
What you have described is a power-user workstation scenario that should - in theory - benefit greatly from the multi-core processors (disclaimer: my own focus is server-centric). The scenario you describe of running many intensive desktop applications concurrently is an ideal one for PCs or workstations based on these quad-core processors. For your intended usage model I would suggest evaluating either a Clovertown (2-socket/8-core) workstation or a PC based on the quad-core desktop processor codenamed Kentsfield (1 socket/4 core system) from your favorite supplier. Having your system powered with 4 or 8 cores should result in all of your concurrently executing applications to run much more quickly and for the system to feel noticeably more responsive and “productive” as compared with a system based on dual-core chips.
Go ahead and take one such system out for a test-drive and let us know what you found.
Dec 11 | Sudip Chahal said:
re: >> We just released Ladybug Player Vista 2 with quad channel video mixing and effects. Our Player has been targeted on the Digital Signage Applications. We have tested quad HD videos on Pentium
My experience is primarily with server based applications so I don’t have a good feel for your scenario. All I can say is that a dual-Clovertown based workstation likely has 4-6x the multimedia compute power of your Pentium-D based system, so would expect a Clovertown based workstation to have a decent shot at real-time quad channel video playback. Such a (not inexpensive) system might even be overkill - these are just guesses as you can tell.
May I suggest that you try out your quad-channel video playback application on a good 2 socket/8-core Clovertown based workstation? If that turns out to be overkill, you may want to test drive your application on a quad-core Kentsfield based PC instead as that may very well suffice for your targeted use case. Please let us know what you find.
Thanks,
Dec 11 | Sudip Chahal said:
re: >> How does this compare to Sun Microsystem’s cool threads technology? They have 32 cores on one chip and Intel only has 4?
Good question. While there are a few key differences, first a clarification regarding the Sun processors - they have 8 cores per chip with each core capable of running 4 threads (a bit like Intel’s hyper-threading) for a grand total of 32 threads per chip i.e., 8 cores/32 threads per chip.
The main differences are as follows:
a. The Sun cores are much simpler in design and have much lower (like 50% or so) single-threaded performance than the Intel cores. This can be very crucial in situations where software is licensed per thread (very common in EDA applications). In such situations, having lots of slow threads can drive up software license costs substantially as compared with fewer much faster cores.
b. Additionally, if the application is not very multi-threaded e.g., some batch jobs or optimizer solvers, then the Sun solution is simply not competitive as its cores are very simple and much lower performance on an individual core basis as compared with the Intel Clovertown Core 2 micro-architecture based cores.
c. While the Sun solution can deliver good performance for “throughput” applications (heavily multi-threaded applications) - the point remains that the applications have to be SPARC applications and not x86 applications which is where the vast majority of the software market is.
d. Finally and most importantly, the Clovertown servers deliver much higher performance overall - thanks to the 8 very high-performance core 2 microarchitecture based cores.
Dec 11 | Sudip Chahal said:
re: >>Or how does the Quad Core Processor enable users to do a Norton AV scan whilst doing anything else without computer freeze-ups and slow responses?
The system continues to be responsive as it can assign a core or two to do the A/V scan while the other cores are available to interact with the user or to execute other applications. Of course, this assumes a balanced platform design such that there is adequate memory and i/o bandwidth to support multiple applications executing concurrently,
Dec 12 | Piotr said:
I Think this Quad-core processors are superb, but they are too expensive, and for normal user the Celeron processors are better and cheaper, now i have Celeron 2800+ and i don’t know a possible way to use 100% of my processor, I like to play in many games, but those programs or games use a lot of graphics, and small use of processor, so Quad-core are only for very rich people or big companys i think not for normal users that’s my opinion…!
Dec 13 | steven e. streight aka vaspers the grate said:
Sudip: thank you for taking the time to understand my situation, my orientation, and my workstation needs.
I am so sick of my computer freezing up or slowing down, and I simply have to multi-task, to get things done quickly. I have a geek neck, like carpal tunnel, cracking and achey, so I must work faster, and reduce the time I spend on the computer.
Your explanation is very good, and I will do as you say at my first opportunity, and will report back to you.
Dec 13 | Sudip Chahal said:
re: >>I Think this Quad-core processors are superb, but they are too expensive
I agree with you that systems based on quad-core processors are for high-end power users like Steven or perhaps gamers. They are too expensive for performing routine tasks.
Thanks,
Dec 14 | Derek said:
The bottom line, like Sudip stated, is that Quad core is the best that intel has to offer. In the vast majority of situations one will hardly utilize dual core to its full potential. Even in the case of many games, the applications are simply not written to take FULL advantage of the hardware. Even in intensive programs like AV, its often your hard drives that are more of a bottleneck than anything. As one of the last (if not the last) mechanical device in your system, its definitely the slowest.
These processors are meant for a niche market: Servers, Media Professionals (Adobe Premiere, 3d Studio Max, Etc.), and Enthusiast Gamers. Each of them has software specifically designed to take advantage of multiple cores (the latter of the three being the least developed in this area), and even so often don’t take full advantage of the hardware.
The bottom line, to me, is that this sort of explosive power is not for the casual user in the least. Blogging, surfing the web, even intensive photo editing wont use this to the fullest. Instead, opt for something that offers the beauty of dual core at a less stratospheric price such as the E6400 (Core 2 Duo) or (if you are really on a budget) an AMD X2 processor.
Dec 14 | Cranford said:
In replying to Sudip Chahal you appear to have left out the very important part “e” in your reply.
e. The 8 core Sun system shares a single low performance floating point unit among all 8 cores. Heavily multi-threaded applications requiring significant floating point operations would show a *HUGE* advantage for Core 2 since 2 Cloverton sockets provides you with 8 high performance floating point units compared to the single low performance Sun Coolthreads floating point unit.
Dec 15 | Sudip Chahal said:
re: >>The bottom line, like Sudip stated, is that Quad core is the best that intel has to offer. In the vast majority of situations one will hardly utilize dual core to its full potential.
Only comment I have is that some of the markets you mention - servers, workstations - are not niche markets. They are actually very large markets in their own right though perhaps not when compared with the desktop market on a unit volume basis. Given that Intel’s quad-core pricing is comparable to dual-core - I would expect uptake in these processing intensive segments to be very very rapid. I am actually having a very difficult time in coming up with reasons as to why - in these and similar segments - why one would not transition over to quad-core for new deployments as quickly as possible. I agree that the typical desktop user with less demanding workloads will likely do very well with a Core 2 Duo processor based (dual-core) system.
Dec 15 | Sudip Chahal said:
re: >>In replying to Sudip Chahal you appear to have left out the very important part “e” in your reply.
Here is a firsthand example of the power of the web - knowledge and expertize from potentially anywhere in the world. This post reminds me that I should have made it clear up-front that my knowledge of the SPARC solution is very limited.
Mr Cranford makes an excellent observation when he points out that the SPARC solution features only a single shared floating-point unit while on the Clovertown system there are 8 such high performance floating point units so for floating point intensive applications this is essentailly a no-contest. Thanks,
Dec 19 | Alex B said:
I am curious if cache coherence can be an issue for accessing shared memory by 4 CPUs at the same time (using MESI protocol).
Dec 21 | Sudip Chahal said:
re: >> am curious if cache coherence can be an issue for accessing shared memory by 4 CPUs at the same time (using MESI protocol).
Cache coherency is one of the challenges that must be overcome by the processor and chipset/platforms architects. There is no viable platform without cache coherency. The cache coherency design problem, is among the many that were addressed by the Intel Clovertown processor and Blackford chipset architects when they designed the platform as a whole (platform codename is Bensley - for what its worth).
Dec 21 | Sudip Chahal said:
re: >> am curious if cache coherence can be an issue for accessing shared memory by 4 CPUs at the same time (using MESI protocol).
Cache coherency is one of the challenges that must be overcome by the processor and chipset/platforms architects. There is no viable platform without cache coherency. The cache coherency design problem, is among the many that were addressed by the Intel Clovertown processor and Blackford chipset architects when they designed the platform as a whole (platform codename is Bensley - for what its worth).
Dec 30 | David said:
Seems the deep and highly specialized science of processor technology may be overlooking some basic shifts which could be made in the overall interests of the company and its products. Why can’t the circuitry of a motherboard be included on the processor chip? Tired of matching various brands of MB’s to processors, often the real problems come from incompatibily of BIOS to CPU, etc. Both AMD and Intel have branched into onboard graphics, Intel even has their own brand of mainboards, too. With shrinking form factors and the physics fact that proximity of components allows for faster overall speed, the time has come to create a singular product that leaves room for memory and IDE connects, but not the gigantic fiber plate we are accustomed to. Instead of separate northbridge, bios, graphics, etc, have them all in one unit..
Dec 30 | Sudip Chahal said:
re: >> Seems the deep and highly specialized science of processor technology may be overlooking some basic shifts which could be made in the overall interests of the
What you are describing is in fact what has been happening over the years. As transistor budgets have grown (and continue to grow - thanks to Moore’s law) various functions that previously used to require separate chips are now integrated into the processor. For example, consider caches, cache controllers, floating-point units. All these used to be discrete chips in very old PCs but over time have gotten integrated into the processor. Some processors on the market include an integrated memory controller as well. While the timing and specifics of a decision to integrate a particular functionality (like an integrated memory controller or graphics) into the processor may vary depending on the target segment, cost, time-to-market, performance and other tradeoffs - the trend you mention is very much there and over the next 3-6 years what you describe is quite likely to happen - at least in some market segments.
To get a visual feel for this trend, just compare the motherboard of a recent PC with one from 10-15 years ago - the new ones have far fewer chips and correspondingly much simpler circuitry.
Thanks,
Jan 03 | Jim G said:
Years ago when speed demand was more than PC speed, entered the multi-processor systems of linked pc’s. I think we hit the pc speed (cpu speed) wall once again. What is the driving problem? 2-d design, heat, 8cm silicon wafer not enough, semiconductor interface speed,…?
Jan 06 | Rob said:
This may be a dumb question but why is each core operating at the same frequency? It would seem to me that they should be staggered such that at each half-cycle one of the cores would be completing a cycle. I would also expect this to be more stable not less as we are talking two separate frequencies not just one for all cores.
Jan 20 | Jon said:
In reply to Davids comments on integrating all parts into a single chip…
In short, its no economically possible. As you get smaller and smaller IC’s there are problems on interfacing these to comparatively large human interfaces (such as keyboard etc) to such small IC’s. Not only that but say they have an IC with everything built in, what does a company do when the southbridge doesn’t work on the IC but the rest does? This is where the cost starts increasing if you don’t produce perfect IC’s everytime (which isn’t possible either. I am taking a university class on the topic and there is a lot I am sure most tech savvy people don’t even know.
In short, a motherboard is needed to interface such a small device to the human world. There are some engineering constraints too that make it have be so large. Capacitors, voltage regulaters etc. Then still you have to realize there is a standard known as ATX. Please do some research before arguing something that isn’t going to happen.
Feb 05 | j2xs said:
With the advent of quad-core, we really need to provide tools and frameworks to the developers in large IT organizations, so that the applications they build can utilize 90% of the 4 cores to complete an operation — rather than 99% of one core. In the SOA and web application realm we have frameworks like .NET and J2EE, but these are tailored to transaction processing and not bulk data processing.
A new framework for data-intensive application development, Pervasive DataRush, hopes to bring the same level of efficiency and programmer productivity to the masses.
I hope Intel’s developer community will take advantage of this framework to maximize the return on their Xeon/Itanium investments!
http://www.pervasivedatarush.com
Sep 03 | steve said:
Hi, my name is Steve Kim and Korean server/storage engineer. As I know the Intel Quad core cpus are two types, L and E series. In my testing L series quad core cpus has no problem anything but E series cpus show a critical problem. those series cpus makes system hanging-up with IERR cpu errors. If you have any idea about this please let me know it.
Oct 05 | Custom said:
Hi. It will be great to look at test results for quad-core Intel processors and quad AMD. If you trust some of tests, please let us know where we can find it. Thanks.
Nov 24 | Duckfoot said:
Jon Said: “There are some engineering constraints too that make it have be so large. Capacitors, voltage regulaters etc. Then still you have to realize there is a standard known as ATX. Please do some research before arguing something that isn’t going to happen.”
First off consider when ATX standard was too big they created guess what MicroATX standard. When ATX standard changed it became ATX 2.2 standard. Micro ATX = maximum size of 244 mm × 244 mm
My micro ATX board has things most average ATX boards don’t HD audio out native to the board, on board graphics, and network card.
10 years ago you had to buy a soundcard a graphics card and a Mobo just to get off the ground. Now that’s all compacted to one board.
In another decade it’s very likely all non moving parts will be mounted on the same board as one unit.
Jon needs to do research into nano technology and nanocircuitry including Nanotubes. Single atoms of gold also work as wires intead of carbon nanotubes. As the circuits shrink even closer to the atomic level a computer as powerful as today may fit on something the size of your credit card by the time most of us reach 80 years old.