Date: April 3, 2018
Author(s): Rob Williams
As we discovered a couple of weeks ago, Chaos Group has been hard at work on its next-gen V-Ray renderer, and fortunately for users, a performance-boosting beta has already been released. We’re taking a preliminary look at performance from the new renderer, both on the CPU and GPU side (spoiler: the chips also join forces!).
Ahead of NVIDIA’s annual GPU Technology Conference in San Jose last week, Chaos Group teased its next-gen V-Ray GPU rendering architecture, aptly named ‘V-Ray GPU’. Ultimately, V-Ray 3.6 will become 4.0, and with that release, both GPU and CPU V-Ray rendering can exhibit an immediate performance uplift.
V-Ray GPU is going to replace V-Ray RT, a renderer that Chaos Group introduced in 2009. Notably, RT was the first renderer for 3ds Max that allowed users to enjoy interactive rendering, allowing them to monitor scene updates on-the-fly. At the time, GPU rendering wasn’t entirely common, so RT, like Adv, began life out as a CPU-based renderer. That quickly changed, though, as Chaos released a GPU-utilizing beta within a year of RT’s release.
A lot has changed since then, and companies like Chaos continue to optimize their products for the new-age of computing. While much of the focus has been pushed to the GPU renderer in 4.0, the CPU renderer has also seen an update, and as alluded to above, that also brings performance gains. Even if you don’t use the CPU renderer explicitly, you can still choose it as an OpenCL or CUDA device and enjoy the performance gains that CPU+GPU can provide.
Being that both 3ds Max 2019 and V-Ray 4.0 beta came out at around the same time, I decided to do some performance testing with the two together. Chaos wasted no time in supporting the new 3ds version, and as long as you’re not using a version older than 3ds Max 2013, V-Ray 4.0 supports you.
In a nutshell, Chaos highlights these V-Ray GPU improvements and new features:
The AI denoiser is a major new feature here, utilizing NVIDIA’s OptiX engine. We originally thought this was restricted to the latest generation Volta-based cards with Tensor cores, however, it turns out this is not the case. Any NVIDIA GPU can take advantage of the denoiser, but Volta cards with the Tensor cores will make it run much faster.
While testing AI denoising would be fun, it’s only one part of the performance enhancement here. As already mentioned, simply upgrading from 3.6 to 4.0 can deliver a really good performance uplift. Whether or not your project is going to render the exact same between versions is going to be hit-or-miss, just like it is if you render between Adv and RT. Expect to have to tweak some things when upgrading.
For testing, I used a handful of officially supplied (by Chaos Group) scenes, tweaked the default render settings slightly, and then directly compared those settings between 3.6 and 4.0. Because the CPU-based renderer has also seen performance improvements here, CPU+GPU tests seemed like a good idea, so those are included in addition to CPU standalone. Here’s a quick look at the test platform:
|SmartKevin Workstation Test System|
|Processor||Intel Core i9-7980XE (18-core; 3.3GHz)|
|Motherboard||ASUS ROG STRIX X299-E GAMING|
|Memory||Kingston HyperX FURY (4x16GB; DDR4-2666 16-18-18)|
|Graphics||NVIDIA TITAN Xp 12GB (GeForce 391.35)|
|Storage||Kingston KC1000 960GB M.2 SSD|
|Power Supply||Corsair 80 Plus Gold AX1200|
|Chassis||Corsair Carbide 600C Inverted Full-Tower|
|Cooling||Corsair Hydro H100i V2 AIO Liquid Cooler|
|Et cetera||Windows 10 Pro (64-bit; build 16299)|
|For an in-depth pictorial look at this build, head here.|
By default, V-Ray GPU will utilize every OpenCL or CUDA device in the system – except the CPU. Chances are good that you’ll want to enable it, as if you are rendering, the computer is going to be so bogged down anyway, you may as well use the CPU power that’s there. But more on that in a second.
First up, here’s a look at GPU (only) rendering performance when merely moving from V-Ray 3.6 to 4.0:
In three of the four cases here, the 4.0 renderer cut the render time almost in half. Even with the tea-set scene, which is the most complex of the four, significant decreases in render times are seen. One thing to bear in mind here is that these renders are being done with two high-end GPUs and a high-end CPU, so a ~600 second gap (eg: Flower scene) would increase greatly as you move down on through the respective CPU and GPU product stacks.
Tying into that, I am currently testing V-Ray 4.0 on more GPUs than just the TITAN Xp, for inclusion in a future performance look. Stay tuned for that if you want to see how anything current-gen beneath the TITAN Xp fares in the ‘Tea-set’ scene – at least on NVIDIA hardware, because no scene I have here renders without some issue on AMD (Radeon or Radeon Pro), at least for the time being.
GPU performance has gotten the bulk of the performance boost in 4.0, but it wouldn’t be fair to discount any added performance gains. Yet again, differences can be seen pretty easily with the CPU renderer:
Interestingly, the Flower scene showed no difference in render time across multiple runs, but the others saw a decent chunk of time knocked off. Given the gains in both areas, wouldn’t it be great to combine their forces and establish the Cult of Ray Tracing? Good news: that’s easily done already.
The shot above shows two GPUs and one CPU being used for a single render, and to great effect (all devices are essentially pegged to 100% usage). Both the CPU and GPU complement each other quite well, something the next chart highlights:
CPUs seem slow for rendering unless you’re looking at one like the i9-7980XE. In some cases, that chip contributes as much as one of the TITAN Xps, and perhaps even more if you look at the 30 minute iteration, where the single CPU contributed more than the two GPUs split down the middle. I’m really impressed by that. I guess that’s why a chip like that retails for $2,000, and the TITAN Xp, $1,200.
Here are the same results visually represented:
It’s hard to see the differences in the later stages, especially with such small thumbnails, so you can download the entire image set here if you want to better compare with the help of your image viewer of choice. Even without IQ examples, however, the numbers themselves speak volumes. GPU alone is fine… CPU alone is OK, but together, they can get real work done.
I spent most of this article talking about the performance gains V-Ray 4.0 can avail, but as the way things go in rendering, few people are likely to look at this new V-Ray version and judge it by render time saved. Instead, many will look at it from the perspective of, “How can I cram more cool stuff into this scene?!” For both software, like a renderer, and hardware, like a GPU, performance gains means time saved for present projects, and more capabilities for future ones.
While huge performance gains were seen with the GPU rendering tests, I feel like the introduction of a Volta GPU could dramatically improve the AI denoising performance, based on multiple examples of the tech I’ve seen before (eg: an NVIDIA tech demo). Those Tensors are probably the big reason NVIDIA has started off its Volta Quadro life with the top-end GV100 instead of a V6000 – Tensor cores are not quite ready for mainstream wallets.
As mentioned before, I’m in the process of testing GPUs other than the TITAN Xp with V-Ray 4.0, and will publish those results in an “updated” look at workstation GPU performance across the board. Unfortunately, AMD hardware won’t be included in this, as no project I’ve rendered through OpenCL has given a suitable result. If I find a project that bucks that trend, I’ll add it in for testing.
Copyright © 2017 SmartKevin