| .NET Struct PerformanceOn this page I attempt to measure how .NET deals with simple data structures that one might employ in numerical computing and computational geometry, and compare the results to various other languages (C++, Java, JavaScript).
Note: This benchmark focuses on method inlining in the presence of user-defined value types, a test case that is notoriously problematic for the .NET CLR. The article Head-to-head benchmark: C++ vs .NET by “Qwertie” on Code Project offers a much broader comparison of computational performance on both platforms, including cases that are more favorable for the CLR and sometimes allow C# to approximate C++ speed. Those interested in the relative performance of JavaScript and Java should also check out Box2D as a Measure of Runtime Performance by Joel Webber. Classes and Structs
The .NET Common Language Runtime (CLR) offers two kinds of user-defined objects: reference types (declared as CLR classes are always allocated on the heap and accessed by reference, just like all user-defined objects in Java and JavaScript (unless the optimizer puts them on the stack), whereas CLR structs are allocated directly within their surrounding memory context. This means stack allocation when structs are not embedded within other objects. Struct member access thereby avoids the extra dereferencing step that classes require, and struct allocation does not increase the garbage collector’s workload. Another consequence is that struct contents are copied wholesale on variable assignments (including to method parameters), whereas assignments of class variables only copy the object references. From a performance viewpoint, the extra time spent on copying large structs eventually erases the benefits of embedded allocation – hence the general recommendation to use structs only for small amounts of data. Copying contents versus references also constitutes an important semantic distinction, but here we’ll focus on runtime performance. Structs should perform better than classes when objects are frequently created and accessed, provided content copying is inexpensive or can be optimized away. The applications that should benefit the most are numerical computing and computational geometry, as they require efficient types for small tuples of floating-point values: complex numbers, two- or three-dimensional coordinates, etc. Primitive Types and StructsSo much for the basics. The following benchmark does not compare structs to classes, but rather user-defined structs (where available) or classes (where not) to equivalent tuples of built-in primitive types. Passing a struct to a method (by value) is semantically equivalent to passing its individual fields, and accessing its fields is equivalent to accessing individual variables of the same type within the same storage context. A good optimizer should be able to exploit this equivalence and produce struct handling code that is indistinguishable from using the “naked” field-equivalent variables directly. This is what we’re going to examine here, in the specific case of methods that don’t change their parameters and are small enough to be inlined. Struct Test Programs
All results shown below were obtained with a suite of small test programs. The download package StructTest.zip (83.4 KB) comprises the precompiled executables and their complete source code. This is a standard ZIP archive containing long file names. Please refer to the file All tests perform 1,000,000,000 loop iterations over two pairs of double-precision values, representing a point’s x- and y-coordinate. We initialize all coordinates to 1, then in each iteration assign the cross-wise sum of all coordinates to the first pair: a := (ax + by, ay + bx). The final coordinates are printed before each result to ensure the calculations were performed correctly (and not optimized away entirely, which Visual C++ can actually do!). Most tests represent points as a simple user-defined type: a struct in C#, a class with stack-allocated instances in C++, and reference-type objects in Java and JavaScript which allow no other option. All tests use property accessors to read a point’s coordinates. We run 2–4 tests for each language and runtime, each calling a different static method to add the coordinates. All methods are short enough to be inlined. The test methods are identified as follows:
Technical Note: Java and JavaScript always use pass-by-value for their method arguments, but what is actually being passed by value in the AddByOut case is itself a reference to an object, and therefore equivalent to C++ and C# passing that object by reference. Sample Test ResultsThe following table shows sample test results on my system, comprising Windows 7 SP1 (64 bit) on an Intel DX58SO motherboard with an Intel Core i7 920 CPU (2.67 GHz) and 6 GB RAM (DDR3-1333). The tests were not conducted with any kind of scientific rigor; I simply ran each test several times and picked a nice round median value. All times are in milliseconds.
The tested versions of Visual C++ and Visual C# are from Microsoft Visual Studio 2010 Professional SP1 with .NET Framework 4.0. The C++ test was also run on MinGW gcc 4.5.2, and the C# test on the Mono 2.10.8 runtime, both of which support only 32-bit execution. All compilers use full optimization (/Ox, -O3, /o) with unchecked arithmetic and no debug information. The tested version of Java is from the Oracle Java Development Kit 7u3, in both 32-bit and 64-bit versions. Java currently provides both a “client” and a “server” JVM on 32-bit Windows but only the server variant on 64-bit Windows, so these were our three test cases. The JavaScript test was run on the web browsers Google Chrome 16, Mozilla Firefox 10.0.2, and Microsoft Internet Explorer 9.0.4 in 32-bit mode. Test ConclusionsVisual C++ — Microsoft has exactly one compiler with a working optimizer for mathematical structures, and that’s the 32-bit version of Visual C++. All other compilers exhibit embarrassing optimization failures when attempting to use simple types in a straightforward way. Astonishingly, this includes the 64-bit version of Visual C++ which achieved worse times than either Visual C# or Java! The cause is most likely an optimizer bug affecting this particular scenario, but I was unable to find any optimizer settings that produced better results. I’ve filed a bug report with Microsoft Connect that may help resolve this issue. gcc — Another good optimizer at work: the Gnu C/C++ compiler performs almost exactly as well as the 32-bit version of Visual C++. Sadly, 64-bit code generation was not enabled in the MinGW port, so I could not test that option.
Visual C# — Counterintuitively, the optimizer of the 32-bit CLR works correctly only when structs are passed by reference rather than by value – not a great alternative due to the changed semantics. While the 64-bit CLR also profits from this trick, it is slower to begin with, and struct handling never reaches the speed of naked
The likely cause for the call-by-reference speedup is the fact that our small test methods can be inlined. My guess is the following: 1. The optimizer identifies call-by-reference structs with objects in the calling code, and so wastes no time creating references. 2. The optimizer realizes that naked Mono — The major third-party CLR is slower than Microsoft’s implementation by a factor of around 2–4, and we again note the counterintuitive result that passing a small struct by reference is faster than passing it by value. This result is not a scathing criticism of Mono – merely keeping up with new .NET features while porting the CLR to many more platforms is quite an achievement! However, it does demonstrate that Mono is not currently an option if you’re looking for better performance.
Java — The 32-bit client JVM performs as expected, i.e. somewhat slower than the CLR. The big surprise comes with the server JVM whose 64-bit optimizer is so excellent that user-defined types outperform pass-by-value structs on the 32-bit CLR, and all struct tests on the 64-bit CLR. The 32-bit server JVM is not much slower. Even better, naked JavaScript — For the longest time, JavaScript was unusable for anything requiring computational performance. This worst case is still represented by Internet Explorer 9 whose JS engine is 7–30x slower than VC# and Java, and an amazing 12–100x slower than C++ (or rather 25–100x if we discount the VC++ 64-bit optimizer bug).
However, other browsers have recently made big strides. Chrome’s V8 JavaScript engine shows much better object handling ( Winners: C++ and JavaThese results are fairly depressing. The CLR should be the fastest managed platform, not least thanks to its user-defined value types. Instead, it can barely keep up with primitive old Java! I applaud the designers of the server JVM for what they managed to tweak out of that language. Meanwhile, the CLR’s greater complexity provides no clear performance benefits, despite the greater burden on the developer – its speed is unimpressive even for hand-optimized code. Performance complaints about .NET are nothing new. Structs specifically were always problematic. My Tektosyne library uses fields instead of property getters where possible, due to frequent inlining failures in older .NET versions. Two important .NET APIs, Windows Presentation Foundation and LINQ to Objects, are notorious for their sluggishness; so is the C# iterator mechanism. To be sure, most developers won’t care. The chief purpose of .NET was to replace various older Microsoft technologies, including Visual Basic, ASP, and Office scripting, which were mostly used for business in-house projects. .NET continues to fill these roles very well, and the CLR (even Mono’s!) is certainly fast enough for them. Other applications are rarely based on .NET and this is unlikely to change, given Microsoft’s current shift back to C++ and onward to HTML 5 and JavaScript. If a platform already dominates its appointed niche, why bother improving it? But one can’t help feeling disappointed that the platform’s great potential goes to waste in this way. Couldn’t Microsoft spend one percent of one percent of its annual monopoly rent to write a decent optimizer for the CLR? Instead, we must choose between abstruse C++ and backward Java if we want good performance. What a shame.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||