A quick overview of the different ways to call unmanaged APIs from managed code, with .Net and also with Mono.
The inspiration for this post came after reading a couple of articles. The first relating to SharpDX:
A new managed .NET/C# Direct3D 11 API generated from DirectX SDK headers
The second:
Techniques of calling unmanaged code from .NET and their speed
that in substance is on the same topic of this post but doesn't provide enough sample code, in particular for the final benchmark.
Native library
In the following examples we will use a function exported from a phantomatic library called Native.dll (Native.so for Mono on Unix/Linux), written in C and compiled with Cdecl calling convention.
//
// Native.h
//
void DoWithIntPointer(int a, int b, int* r);
//
// Native.c
//
#include "Native.h";
void DoWithIntPointer(int a, int b, int* r) {
*r = a + b;
}
DoWithIntPointer simply calculates the sum of two integer but having parameters passed by value and by reference it will allow us to see some peculiarities.
Explicit P/Invoke
Let's start with a classic P/Invoke example:
//
// TestPInvoke.cs
//
using System.Runtime.InteropServices;
class TestPInvoke {
[DllImport(
"Native.dll",
CallingConvention = CallingConvention.Cdecl
)]
private static extern void DoWithIntPointer(
int a,
int b,
out int r
);
public static void Main() {
int result = 0;
DoWithIntPointer(1, 2, out result);
}
}
The extern keyword tells the compiler that DoWithIntPointer is defined elsewhere while the DllImport attribute provides directions to trace it.
Implicit P/Invoke - C++/Cli
With C++/Cli we can write wrappers for native libraries with relative ease but it cannot be used with Mono. Here we have the C++/Cli wrapper for our Native library:
//
// NativeCppCliWrapper.cpp
//
#include "Native.h";
namespace NativeCppCliWrapper
{
using namespace System;
using namespace System::Runtime::InteropServices;
public ref class Wrapper {
public:
static void CallDoWithIntPointer(
Int32 a,
Int32 b,
[Out] Int32% r
) {
int tmp;
DoWithIntPointer(a, b, &tmp);
r = tmp;
}
};
}
After compiling, the wrapper can be used as any other assembly:
//
// TestCppCli.cs
//
using NativeCppCliWrapper;
class TestCppCli {
public static void Main() {
int result = 0;
Wrapper.CallDoWithIntPointer(1, 2, out result);
}
}
If we dig through the IL code generated by C++/Cli we can see that CallDoWithIntPointer invokes:
IL_0004: call void modopt(
[mscorlib]System.Runtime.CompilerServices.CallConvCdecl
) '<module>'::DoWithIntPointer(int32, int32, int32*)
And DoWithIntPointer is described by the following metadata:
.method assembly static pinvokeimpl("" lasterr cdecl)
void modopt(
[mscorlib]System.Runtime.CompilerServices.CallConvCdecl
) DoWithIntPointer (
int32 '',
int32 '',
int32* ''
) native unmanaged preservesig
{
.custom instance void
[mscorlib]System.Security.SuppressUnmanagedCodeSecurityAttribute::.ctor()
= ( 01 00 00 00 )
}
Converted in C# (with IlSpy):
[SuppressUnmanagedCodeSecurity]
[DllImport("",
CallingConvention = CallingConvention.Cdecl,
SetLastError = true
)]
[MethodImpl(MethodImplOptions.Unmanaged)]
internal unsafe static extern void DoWithIntPointer(
int,
int,
int*
);
Does this remind us of anything? Yes, it is very similar to the extern declaration that we have seen previously but among the differences we can note an attribute called SuppressUnmanagedCodeSecurity. MSDN tells us that:
This attribute is primarily used to increase performance; however, the performance gains come with significant security risks.
Security risks apart it can be used with explicit P/Invoke, it is not an exclusive of C++/Cli.
In other situations the native code is called in a more sophisticated way, for example if we poke inside IL code of SlimDX we can find things like this:
.method public hidebysig
instance valuetype SlimDX.Result Optimize () cil managed
{
// Method begins at RVA 0xd0824
// Code size 25 (0x19)
.maxstack 3
IL_0000: ldarg.0
IL_0001: call instance valuetype
IUnknown* SlimDX.ComObject::get_UnknownPointer()
IL_0006: dup
IL_0007: ldind.i4
IL_0008: ldc.i4.s 68
IL_000a: add
IL_000b: ldind.i4
IL_000c: calli System.Int32 modopt(
System.Runtime.CompilerServices.IsLong
) modopt(
System.Runtime.CompilerServices.CallConvStdcall
)(System.IntPtr)
IL_0011: ldnull
IL_0012: ldnull
IL_0013: call valuetype SlimDX.Result
SlimDX.Result::Record
<class SlimDX.Direct3D11.Direct3D11Exception>(
int32, object, object
)
IL_0018: ret
}
The calli instruction is used to invoke a native method given the address of the method itself. We will see how to take advantage of calli without C++/Cli in the last example.
Dynamic P/Invoke
In order to employ the previos techniques tha native library must be know at compile time, while it must be located in a certain path at runtime. With dynamic P/Invoke we can obtain a greater degree of flexibility.
In the next example we will benefit by an assembly called CSLoadLibrary but slightly modified, in particular to run on Unix/Linux via Mono (see the download link at the end of this post for the modified version). CSLoadLibrary contains an UnmanagedLibrary class that provides access to native libraries through standard Windows APIs (LoadLibrary, GetProcAddress, FreeLibrary) or Unix/Linux counterparts (dlopen, dlsym, dlclose).
//
// TestDelegate.cs
//
using System.Runtime.InteropServices;
using CSLoadLibrary;
class TestDelegate {
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
delegate void DelegateWithIntPointer(
int a,
int b,
out int r
);
public static void Main() {
UnmanagedLibrary nativeLib =
new UnmanagedLibrary(
"Native"
);
DelegateWithIntPointer doWithIntPointer =
nativeLib.GetUnmanagedFunction
<DelegateWithIntPointer>(
"DoWithIntPointer"
);
int result = 0;
doWithIntPointer(1, 2, out result);
}
}
In practice, given the name of the native library to load, an UnmanagedLibrary object is instantiated. Then, with GetUnmanagedFunction we obtain a delegate pointing to our native function, DoWithIntPointer. Naturally the signature of the delegate must match the signature of the native function.
Dynamic P/Invoke – Explicit P/Invoke
This time, instead of using CSLoadLibrary, the delegate is created via Reflection, replicating the extern declaration shown in the Explicit P/Invoke example.
//
// TestDynamicS.cs
//
using System;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.InteropServices;
using System.Security;
class TestDynamicS {
delegate void DelegateWithIntPointer(
int a,
int b,
out int r
);
public static void Main() {
DelegateWithIntPointer doWithIntPointer =
GetDynamicSDelegate
<DelegateWithIntPointer>(
"Native",
"DoWithIntPointer",
CallingConvention.Cdecl
);
int result = 0;
doWithIntPointer(1, 2, out result);
}
private static TDelegate GetDynamicSDelegate
<TDelegate>(
string libraryName,
string entryPoint,
CallingConvention callingConvention
) where TDelegate : class
{
Type delegateType = typeof(TDelegate);
MethodInfo invokeInfo = delegateType.GetMethod("Invoke");
// Gets the return type for the P/Invoke method.
Type invokeReturnType = invokeInfo.ReturnType;
// Gets the parameter types for the P/Invoke method.
ParameterInfo[] invokeParameters =
invokeInfo.GetParameters();
Type[] invokeParameterTypes =
new Type[
invokeParameters.Length
];
for (int i = 0; i < invokeParameters.Length; i++) {
invokeParameterTypes[i] =
invokeParameters[i].ParameterType;
}
// Defines an assembly with a module and a type.
AssemblyName assemblyName =
new AssemblyName(
"TestAssembly"
);
AssemblyBuilder assemblyBuilder =
AppDomain.CurrentDomain.DefineDynamicAssembly(
assemblyName,
AssemblyBuilderAccess.Run
);
ModuleBuilder moduleBuilder =
assemblyBuilder.DefineDynamicModule(
"TestModule"
);
TypeBuilder typeBuilder =
moduleBuilder.DefineType(
"TestDynamicS"
);
//Defines a P/Invoke method called Invoke.
MethodBuilder methodBuilder =
typeBuilder.DefinePInvokeMethod(
"Invoke",
libraryName + ".dll",
entryPoint,
MethodAttributes.Public |
MethodAttributes.Static |
MethodAttributes.PinvokeImpl,
CallingConventions.Standard,
invokeReturnType,
invokeParameterTypes,
callingConvention,
CharSet.Ansi
);
methodBuilder.SetImplementationFlags(
methodBuilder.GetMethodImplementationFlags() |
MethodImplAttributes.PreserveSig
);
// Adds SuppressUnmanagedCodeSecurityAttribute to
// the method.
Type attributeType =
typeof(
SuppressUnmanagedCodeSecurityAttribute
);
ConstructorInfo attributeConstructorInfo =
attributeType.GetConstructor(
new Type[] {}
);
CustomAttributeBuilder attributeBuilder =
new CustomAttributeBuilder(
attributeConstructorInfo,
new object[] {}
);
methodBuilder.SetCustomAttribute(attributeBuilder);
// Finishes the type.
Type newType = typeBuilder.CreateType();
object tmp =
(object)Delegate.CreateDelegate(
delegateType,
newType.GetMethod("Invoke")
);
return (TDelegate)tmp;
}
}
Though we are adding the SuppressUnmanagedCodeSecurity attribute, it is not essential.
Dynamic P/Invoke - Emit Calli
The last example is more complicated. Here again, we make use of UnmanagedLibrary to get the native function's address. Then, through Reflection, we create a dynamic method which internally passes the address of the native function to calli, the instruction seen previously.
//
// TestCalli.cs
//
using System;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.InteropServices;
using CSLoadLibrary;
class TestCalli {
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
delegate void DelegateWithIntPointer(
int a,
int b,
out int r
);
public static void Main() {
UnmanagedLibrary nativeLib =
new UnmanagedLibrary(
"Native"
);
IntPtr nativeMethodAddress =
nativeLib.GetUnmanagedFunctionAddress(
"DoWithIntPointer"
);
DelegateWithIntPointer doWithIntPointer =
GetCalliDelegate
<DelegateWithIntPointer>(
nativeMethodAddress
);
int result = 0;
doWithIntPointer(1, 2, out result);
}
private static TDelegate GetCalliDelegate
<TDelegate>(
IntPtr methodAddress
) where TDelegate : class
{
Type delegateType = typeof(TDelegate);
MethodInfo invokeInfo = delegateType.GetMethod("Invoke");
// Gets the return type for the dynamic method and calli.
// Note: for calli, a type such as System.Int32& must be
// converted to System.Int32* otherwise the execution
// will be slower.
Type invokeReturnType = invokeInfo.ReturnType;
Type calliReturnType =
GetPointerTypeIfReference(
invokeInfo.ReturnType
);
// Gets the parameter types for the dynamic method
// and calli.
ParameterInfo[] invokeParameters =
invokeInfo.GetParameters();
Type[] invokeParameterTypes =
new Type[
invokeParameters.Length
];
Type[] calliParameterTypes =
new Type[
invokeParameters.Length
];
for (int i = 0; i < invokeParameters.Length; i++) {
invokeParameterTypes[i] =
invokeParameters[i].ParameterType;
calliParameterTypes[i] =
GetPointerTypeIfReference(
invokeParameters[i].ParameterType
);
}
// Defines the dynamic method.
DynamicMethod calliMethod =
new DynamicMethod(
"CalliInvoke",
invokeReturnType,
invokeParameterTypes,
typeof(TestCalli),
true
);
// Gets an ILGenerator.
ILGenerator generator = calliMethod.GetILGenerator();
// Emits instructions for loading the parameters into
// the stack.
for (int i = 0; i < calliParameterTypes.Length; i++) {
if (i == 0) {
generator.Emit(OpCodes.Ldarg_0);
} else if (i == 1) {
generator.Emit(OpCodes.Ldarg_1);
} else if (i == 2) {
generator.Emit(OpCodes.Ldarg_2);
} else if (i == 3) {
generator.Emit(OpCodes.Ldarg_3);
} else {
generator.Emit(OpCodes.Ldarg, i);
}
}
// Emits instruction for loading the address of the
//native function into the stack.
switch (IntPtr.Size) {
case 4:
generator.Emit(
OpCodes.Ldc_I4,
methodAddress.ToInt32()
);
break;
case 8:
generator.Emit(
OpCodes.Ldc_I8,
methodAddress.ToInt64()
);
break;
default:
throw new PlatformNotSupportedException();
}
// Emits calli opcode.
generator.EmitCalli(
OpCodes.Calli,
CallingConvention.Cdecl,
calliReturnType,
calliParameterTypes
);
// Emits instruction for returning a value.
generator.Emit(OpCodes.Ret);
object tmp =
(object)calliMethod.CreateDelegate(
delegateType
);
return (TDelegate)tmp;
}
private static Type GetPointerTypeIfReference(Type type) {
if (type.IsByRef) {
return Type.GetType(type.FullName.Replace("&", "*"));
}
return type;
}
}
The method GetPointerTypeIfReference converts the type of a parameter like Int32& to Int32*, otherwise calli executes correctly but results slower.
Benchmark
Hardware: CPU Intel Core i3-2310M 2.1 GHz, RAM 4 GB.
Software: VMware Player 3.1.4. on Windows 7 x64.
[ms] x 100,000,000 iterations
SUC means that the test has been executed with the SuppressUnmanagedCodeSecurity attribute.
.Net 4 | Mono 2.10.6 | Mono 2.10.5 | Mono 2.6.7 | |
---|---|---|---|---|
Windows | Windows | Ubuntu | Debian | |
XP x32 | XP x32 | Oneiric amd64 | squeeze i386 | |
Expl. P/I | 8117 | 12001 | 2346 | 3657 |
Expl. P/I SUC | 3681 | 3485 | 2344 | 3708 |
C++/Cli | 4760 | . | . | . |
Dyn. P/I | 19309 | 68303 | 2603 | 4431 |
Dyn. P/I SUC | 7361 | 59718 | 2615 | 4514 |
Dyn. P/I Expl. SUC | 4497 | 4419 | 2480 | 4398 |
Dyn. P/I Calli | 4136 | 4249 | 1885 | 4353 |
.Net 4 | Mono 2.10.6 | Mono 2.10.5 | Mono 2.6.7 | |
---|---|---|---|---|
Windows | Windows | Ubuntu | Debian | |
XP x32 | XP x32 | Oneiric amd64 | squeeze i386 | |
Expl. P/I | 8215 | 14328 | 2194 | 4819 |
Expl. P/I SUC | 4203 | 6658 | 2207 | 4826 |
C++/Cli | 5576 | . | . | . |
Dyn. P/I | 22881 | 68789 | 3658 | 7260 |
Dyn. P/I SUC | 7478 | 60425 | 3780 | 7251 |
Dyn. P/I Expl. SUC | 4247 | 6627 | 3655 | 7294 |
Dyn. P/I Calli | 3925 | 3766 | 3050 | 6766 |
Conclusion
The above results seem a bit weird and the only certain thing appears to be that Dynamic P/Invoke is always faster if we resort to calli. Anyway, we have seen different ways to invoke unmanaged code from managed code, each with its own pros and cons, and if necessary we can use them or conduct further tests.
Download
Benchmark source code.
Other resources
About interoperability:
About calli:
An experiment about nine open source polygon clipping libraries written in C/C++ that have been benchmarked after being wrapped for use with .Net.
** LAST UPDATE - MAR 04, 2013 **
Note that there could be errors and that no consideration is made about the robustness or optimization of the libraries so this benchmark must be taken with a grain of salt
.
The nine libraries
Library | Version | License |
---|---|---|
Boost.Geometry | Boost 1.53 | Boost v1.0 |
Boost.Polygon | Boost 1.53 | Boost v1.0 |
Bop | 1.2 (from the site and zip file) | Public domain |
Cgal | 4.1 | Various (mainly GPL and LGPL) / Commercial |
Clipper | 5.1.2 | Boost v1.0 |
Geos | 3.3.7 | LGPL v2.1 |
Gpc | 2.32 | Free for non-commercial use / Commercial |
KBool | 2.1 | GPL v3 / Commercial |
TerraLib | svn 10190 | LGPL v2.1 |
The guests
For comparative purposes other libraries have been used during the benchmark.
Library | Version | License |
---|---|---|
PolyBoolean.NET | 2.0.0 (demo) | Commercial |
SQL Server System CLR Types 2012 | 2011.110.2100.60 | Proprietary |
Wrappers
The open source libraries compared in this benchmark are written in C or C++ and are wrapped with a light C++/Cli wrapper that exposes only the polygon set operations (union, difference, intersection, disjoint-union).
C++/Cli provides a relatively easy solution to interoperability with native libraries but, actually, renders the wrappers tied to the Windows platform.
Hardware and software
CPU | Intel Core i5 3570k |
---|---|
RAM | 8 GB |
OS | Windows 8 x64 |
SDK | Windows 8 Sdk + Visual Studio 2012 Express |
Settings | x64 release build, /O2 |
Notes
- Boost.Geometry, also known as Ggl, can be used with Mpir (a Gmp fork) or with TTMath, for more precision but with a significant slowdown. The vanilla version is used.
- Boost.Polygon, also known as Gtl, can be used with Mpir with no speed penalty. The vanilla version is used.
- Bop has two implementations, one depends on Cgal. The version with no dependencies is used.
- Cgal is built with Boost (1.53), Mpir (2.6.0), Mpfr (svn 8450). A Simple_cartesian<double> kernel is used given the fact that it seems faster than other kernel types.
- KBool source has been patched to avoid creating the file keygraphfile.key.
- PolyBoolean.c20.NET is used instead of PolyBoolean.c30.NET because it is faster but coordinates values are restricted to a 20 bit range, so all tests comply to this constraint. Furthermore the demo version of PolyBoolean throws an exception when the number of vertices returned is a multiple of 7.
- Boost.Geometry, Geos, TerraLib and SQL Server System Types require closed polygons. The polygons are closed for these libraries only.
- An intersection operation is executed in all tests.
Notes about the charts
- In order to show all the results click the grayed out legend items.
- Linear charts can be zoomed by dragging the mouse on the plot area.
Benchmark - Classic

The polygons used in this test are the same ones used to benchmark PolyBoolean (C++ version) and Clipper on their websites (in particular see the test with 174239 vertices) and are extracted from True Type Font contours.
Clipper is the fastest followed by Boost.Geometry, Sql Server ST and TerraLib.
Clipper is faster than PolyBoolean and PolyBoolean is faster than Gpc. This result seems coherent with the same benchmark on the PolyBoolean and Clipper websites.
Benchmark - Known

In this test one operand is a polygon that resembles a gear while the other operand is formed by a set of concentric rings. During the test the gear teeth are increased in number and length as well as the number of rings. The polygons are composed of lines with various slopes and at the same time the theoretical number of intersections is known.
In the long term, Boost.Geometry is the fastest, then Sql Server ST and Boost.Polygon come. However Clipper starts better.
Note that from a certain point onward Bop always crashes. Moreover Cgal was stopped before the end because it was taking too long to finish.
Benchmark - Random

In this test the operands are polygons obtained subtracting random triangles from a square and Boost.Polygon is used to prepare those operands.
Bop is the fastest followed by Sql Server ST, Boost.Geometry and TerraLib.
Cgal is excluded from this test because too much susceptible, in particular it requires simple polygons. During other preparatory tests, Bop, PolyBoolean and Sql Server ST have crashed towards the end.
Benchmark - Grid

In this test the operands are squares containing a grid of holes. During the test the total number of input polygons and vertices is constant but the holes are positioned in order to obtain an increasing number of intersections.
The interesting thing to note here is that some libraries such as Boost.Geometry and Geos give the best performance when the number of intersections increases (Boost.Geometry is one of the slowest at the start and one of the fastest at the end). This is a factor to consider in the results of the previous tests, in particular the benchmark with random polygons (in practice, in that test, if the size of the outer square is increased and the other parameters are mantained we can see a very different progression for certain libraries).
x32 vs x64
Benchmark - Classic, a comparison with the same libraries built for an x32 release.Certain libraries seem to be faster when built for an x32 system, in particular Cgal gains over 100 seconds. Previous versions of this benchmark, on different hardware and operating system (and with different operands), did not show this aspect.
Gpc - C# wrapper vs C++/Cli wrapper
Gpc has a C# wrapper and here is a comparison with the C++/Cli wrapper. The input polygons are obtained as in Benchmar - Known.
Not a big difference between a P/Invoke and a C++/Cli wrapper but P/Invoke is a portable solution that can be used with Mono.
Clipper - C# vs C++/Cli wrapper
Clipper has a C# implementation and here is a comparison with the wrapped C++ implementation. The input polygons are obtained as in Benchmar - Known.
The C# implementation is slower but can be used directly with .Net and Mono.
Downloads
Source code (Wrappers - C++/Cli + Benchmark - VB.Net)
Source code for the libraries must be downloaded from the respective sites.
IMPORTANT UPDATES
MARCH 04, 2013
- Added: Bop, Cgal, TerraLib.
AUGUST 14, 2011
- Added: Geos, SQL Server System Types.
JULY 23, 2011
- Added: Boost.Geometry.