Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1000 / CUDA

.docx
Скачиваний:
29
Добавлен:
15.06.2014
Размер:
16.23 Кб
Скачать

CUDA-Enabled Apps: Measuring Mainstream GPU Performance

I like an eye-popping benchmark as well as the next guy. But at the end of the day, I’m a user. I use computers to do useful things. And on days when I have to give back the Tom’s Hardware Lear jet, and all the ski bunnies go back to their warrens, I have a modest computer with modest components and not much budget to spare for $500 upgrades. I need technology that’s going to help me do what I want more efficiently, whether it’s play games, edit video, or help model genetic sequences.

Some applications are linear in nature and merely want to crank as quickly as possible on a single processing thread until the cows come home. Others are built to leverage parallelism. Everything from Unreal Engine 3 to Adobe Premiere has shown us the benefits of CPU-based multi-threading, but what if 4 or 8 or even 16 threads was just a beginning?

This is the promise behind Nvidia’s CUDA computing architecture, which, according to the company’s definition, can run thousands of threads simultaneously.

We've written about CUDA in the past, so hopefully you’re no stranger to the technology (if you did miss our coverage, check out Nvidia's CUDA: The End of the CPU?) For better or worse, though, most CUDA coverage in the press has focused on high-end hardware, even though the supporting logic has been present in Nvidia GPUs since the dawn of the GeForce 8. When you consider the huge enterprise dollars wrapped up in the high-performance computing (HPC) and professional graphics workstation markets—targeted by Nvidia’s Tesla and Quadro lines, respectively—no wonder this is where so much of Nvidia’s marketing attention has been.

But in 2009, we finally see a change. CUDA has come to the masses. There's a huge install base of compatible desktop graphics cards, and the mainstream applications able to exploit that built-in CUDA support are hitting one after the other.

From Nothing To Now

The first consumer-friendly CUDA app was Folding@Home, a university distributed computing project out of Stanford in which each user can crunch a chunk of raw data about protein behavior so as to better understand (and hopefully cure) several of humanity’s worst diseases. The application transitioned to CUDA compatibility in the second half of 2008. Very shortly afterward came Badaboom, the video transcoder from Elemental Technologies that, according to Elemental, can transcode up to 18 times faster than a CPU-only implementation.

Then came a whole slew of media applications for CUDA: Adobe Creative Suite 4, TMPGEnc 4.0 XPress, CyberLink PowerDirector 7, MotionDSP vReveal, Loilo LoiLoScope, Nero Move it, and more. Mirror’s Edge looks to be the first AAA game title to fully leverage CUDA-based PhysX technology for increasing visual complexity, allegedly by 10x to 20x. Expect to see more titles emerge in this vein—a lot more. While AMD and its ATI Stream technology have been mired in setbacks, Nvidia has been hyping its finished and proven CUDA to everyone who will listen...and developers now seem to be taking the message to heart.

That’s all well and good, but proof of CUDA’s incendiary capabilities has largely been proven on high-end GPUs. I’m on a tight budget. Friends are getting mowed down around me by lay-offs and wage-cuts like bubonic plague victims. You bet, I’d love to drop ten or twelve Benjamins on a 3-way graphics overhaul, but the reality is that, like many of you, I’ve only got one or two C-notes to spare. On a good day. So the question all of us who can’t afford the graphics equivalent of a five-star menage-a-troi should be asking is, “Does CUDA mean anything to me when all I can afford is a budget-friendly card for my existing system?”

Let’s find out. Today, we'll be looking at some of the most promising titles and measuring the speed-up garnered from a pair of mid-range GPUs.

Our Retro Test Platform

Unlike most of the benchmarked platforms you see reviewed here, we intentionally avoided the latest and greatest components for our testing environment. Instead, we dusted off some gear from two or three years ago: an Intel DG965WH motherboard, Core 2 Duo E6700 processor (65nm) with stock heatsink-fan, two sticks of Kingston 512 MB DDR2-533 ValueRAM, and a 250GB Maxtor MaXLine III hard drive. We threw Windows Vista SP1 on this "beast" and called it good.

Again, the idea was to approximate the sort of system a true mainstream Joe might have on his desk, especially one bought at retail. He’s been getting by with it for a while and wants to perk things up. Obviously, stepping into a Core i7 or late-model Phenom would necessitate a drastic overhaul requiring several hundred dollars. We figured Joe, a self-admitted .mp3 and mobile video junkie, might have about $150 to spare and be pretty interested in all of these wild claims of 10x or 100x performance gains bestowed by CUDA.

We picked two cards in Joe’s price range. The first was a GeForce 9600 GT with 1 GB of GDDR3, currently selling for roughly $120 online. The second was a GeForce 9800 GTX, now largely displaced by the GeForce 9800 GTX+/GeForce GTS 250 (about $30 more, if you want 1 GB of memory). Sure, you can get CUDA on a $75 8600 GT board, but we’d rather recommend the generally smaller fab process, higher clock rates, and larger shader processor counts in the newer generation.

Specifically, whereas the 8600 GT had 32 stream processors (unified, programmable shaders), the 9600 GT has 64 and the 9800 GTX has 128. These stream processors can all crunch on CUDA tasks in parallel, each handling many CUDA operations at a time. It’ll be interesting to see how much of a difference $20 and 64 stream processors makes in the real world.

Aliens? Look Faster

If you’ve seen the movie Contact, you know the gist of the SETI program. The Search for Extraterrestrial Intelligence program uses radio astronomy to search the skies for radio signals that, by their nature, must have come from intelligent life beyond the Earth. Raw data is gathered in a 2.5 MHz-wide band and streamed back to SETI@home’s main location at UC Berkeley. As shown in the movie, most if not all such radio data is simply random noise, like static against the cosmic background. The SETI@home software performs signal analysis on this data, scouring the bits for non-random patterns, such as pulsing signals and power spikes. The more floating point computing power available to process the data, the wider the spectrum and the more sensitive the analysis can be. This is where the parallelism of multi-threading and CUDA pay off.

Berkeley workers divide the raw data into single-frequency work units of about 0.35 MB, or 107 seconds. The SETI@home server then doles out work units to home computers, which typically run the SETI@home client as a screen saver application. When SETI@home went live in May of 1999, the goal was to combine the collective power of 100,000 PCs. Today, the project boasts over 300,000 active computers across 210 countries.

In benchmarking SETI@home, one needs a consistent work unit in order to get reliable results. We only found this out after hours of receiving nonsensical results. It turns out that the Nvidia performance lab had been prepping special script and batch files for testing SETI@home. These run from a command line, not the usual graphics-rich, eye candy interface. Nvidia sent us the needed files and a much clearer performance picture emerged.

http://www.tomshardware.com/reviews/nvidia-cuda-gpgpu,2299.html 7482 знака

Соседние файлы в папке 1000