Anatomy of a driver bug

X-Plane's plugin systems allows authors to load models in two ways: Asynchronously and Synchronously. Most plugins tend to use synchronous loading, but since all plugins run on the main thread there is a need and desire for some to use asynchronous loading. Up until now, the latter was broken however, and plugin authors complained about invisible models. From what I understand Pilot Edge were the first to complain, but they want to use async loading to dynamically load in models of planes that flew in via multiplayer, which makes it non-trivial to debug. Luckily, the author of the fantastic Better Pushback also ran into the issue and his plugin is completely local, so it's no problem to stop at a debugger for any amount of time without being worried a multiplayer socket closes and the whole setup has to be recreated.

Symptoms

As mentioned, authors would complain about their models being invisible when using async loading. However, the object was there in X-Plane, and for example attached lights would function as expected. In the case of Better Pushback this looked like this:

Better Pushback truck with lights but no model

Note the lights of the pushback truck being there, but the model being invisible.

Now, at that point we had no idea wether this was a potential plugin problem or a bug in X-Plane, so Ben went out and looked at the code of Better Pushback while I dug into X-Plane's source code.

The first issue

What I found was a problem with the way the multithreaded OpenGL was handled. Basically it would load the model, create the VBO and IBO and then queue a continuation on the main thread that would call the plugins callback. However, what it didn't do was any synchronization between the background OpenGL context and the main threads context. So basically we ended up in a race where if the VBO or IBO upload wasn't finished uploading by the time we told the plugin it was okay to use the model, potentially we would render with a non existent VBO or IBO. The thing with OpenGL is though that it takes just about any abuse from us programmers and just does the least dumb thing, so rendering with an invalid VBO/IBO is like rendering with no VBO/IBO: It simply doesn't do anything. Which fits with the description of no visible model. What is weird though is that after a frame or two it should start working because the VBO/IBO upload completes and the resources are available. From what we gather this was a bug exclusive to Nvidia though, and potentially their driver doesn't automatically propagate side effects unless explicitly asked or the side effects already happened. Nvidia does interesting things in the name of performance (it's like an optimizing compiler that deduces that you will never see the side effects of your code therefore it can elide it). That would fit our bug since sometimes the async load would just work, but more often it didn't (ie. sometimes we won the race and sometimes we lost).

The solution to all of that is simple, of course: OpenGL provides sync fences that can be inserted into the command stream and that get signalled when the GPU gets to the point in the stream where they are. So, the solution is to put the fence in, flush our work and then on the main thread periodically check if the fence is signalled and then tell the callback. Easy solution.

The second issue

This is where things got interesting. 99% of the time it was working now, the model was there. This was in contrast to before where it only worked every once in a while. So the sync fence definitely fixed some of the problem, just not all of it. Now, X-Plane does a lot of caching of OpenGL state just to avoid driver overhead. One of the implications of that is that VBO and IBOs are never unbound and instead code binds the VBO and IBO that they need indirectly through the cache: Ie, code requests a VBO to be bound, the cache checks what VBO is currently bound and then either binds the new VBO or just does nothing. So potentially somewhere else in the lifetime of the background thread that loads the model, it also does some weird operation on the currently active buffer that results in its data becoming corrupt and that's why the model is gone that 1% of the time. Luckily that's easy to check, just bind the 0 VBO/IBO after we are done creating the VBO/IBO and see if the problem persists. It turns out, the problem went away. Our bug was fixed!

Now, this is a very unsatisfying fix and it points to a deeper bug in the X-Plane codebase. We should never manipulate objects we don't own. To figure out what went wrong I hooked all calls that manipulate VBOs and their data: glBufferSubData, glBufferData, glMapBuffer and glDeleteBuffers paired with glGenBuffers. Nothing touched our buffers. This was the point where things started to slide towards WTF territory, because if nothing touches our buffer, why isn't our object rendering? The background thread that creates the buffer MUST be the one that also corrupts it, because binding the 0 VBO on that thread fixes the problem. But that thread never touched it afterwards. To proof that, I wired the thread to spin until the sync fence signals the main thread (at which point the buffer was already corrupt). And it turns out, even with the spinning, the buffer was still corrupt. The only other thing that could touch our buffer was the Nvidia OpenGL driver now, all the evidence pointed at it.

So what do you do when it's the drivers fault? You go for the Whiskey, of course! In all seriousness though, we are now using the workaround of binding the 0 VBO in addition to the proper fix for the first bug. I have no idea what is going wrong there, but I have some additional observations: glIsBuffer returns GL_FALSE for the corrupt buffer. And since we are in the compatibility profile, which allows arbitrary binding of non-existing resources (and then actually creates them), it would explain why we only see a size of 0 but not an OpenGL error. In fact, any glIsXXX call fails on that object, essentially in the eyes of OpenGL it doesn't exist. Yet, it's never deleted and its creation succeeds.

Conclusion

Everybody always assumes the bugs they see are their fault and try the hardest proving that. In this case this was made worse because there was an actual bug in the X-Plane code base. It just happened to also hit a driver bug at the same time. So a lot of time was spend trying to find the second smoking gun in the codebase, except it never existed in the first place. This was a very interesting coincidence a fun bug to debug, even though the end result is very unsatisfying because god knows what the Nvidia driver does.

Investigating a X-Plane plugin's crash to desktop

This post is about a crash to desktop that I investigated in a popular plugin for X-Plane, X-Assign. This happened in my free time, although I had the advantage of having the X-Plane source code at hand.

Git bisect

My new favourite tool on earth is git bisect, which I used to find the offending commit rather quickly. At this point I wasn't sure who was at fault for the crash, X-Plane or X-Assign, since the issue only showed up with the update to X-Plane 11.02 and it was working fine in previous versions. The offending commit however turned out to be rather boring, it simply changed the capacity of a couple of datarefs from 100 to 250. Two things about that were interesting though, first of all, the capacity of the underlying variable was already 250, a change introduced in X-Plane 11. Second of all, those datarefs were input related, namely sim/joystick/joystick_axis_values, sim/joystick/joystick_axis_assignments and sim/joystick/joystick_axis_reverse. So not unreasonable that they would be used by X-Assign. The change however, shouldn't have really broke X-Assign in any way. To figure out what happened, I did was every reasonable person would do: I opened up the disassembler!

Disassembling X-Assign

Because I already declared git bisect to be my favourite tool, Hopper has to be my second favourite tool. I don't often try to take binaries apart, but every time I do, Hopper makes it super easy and pleasant. Ah, who am I kidding, x86-64 assembly is still awful.

Sadly MSVC isn't super keen on leaving useful debug symbols in binaries, so they are way too hard to read. I knew where to look, but I had no idea about the context of the procedure I was looking at. Luckily Clang doesn't strip the binary by default of every symbol, so I tried opening the macOS version of the plugin and who would have thought, the debug symbols were there! I used those debug symbols to manually rename some addresses and references in the Windows version and armed with that I set out to figure out what was happening.

As it turns out, X-Assign almost does it right! And by doing so, it fails hard. It reads the number of assignments for joystick axes by calling XPLMGetDatavi("sim/joystick/joystick_axis_assignments", 0, 0, 0) which returns the number of entries in the array. In previous versions this would return 100 but now in X-Plane 11.02 it returns 250.

The way X-Assign does the parsing of its config files is a bit unconventional, it basically builds a super long format string on the stack and then passes that to fscanf() which reads the whole file at once. In pseudo code we end up with something like:

char buffer[...];

int count = XPLMGetDatavi("sim/joystick/joystick_axis_assignments", 0, 0, 0);

for(int i = 0; i < count; i ++)
	strcat(buffer, "%%i:%%i");
	
fscanf(file, buffer, &arg_01, &arg_02, &arg_03, ...);

Now, the problem with that approach is that you need to have a buffer on the stack that is long enough AND you need to supply enough arguments to fscanf(). One way or another, if your buffer is not long enough or you don't supply enough arguments, you'll end up smashing your stack and in the best case a crash (which is exactly what happened). X-Assign assumes 100 axis assignments, so the buffer and arguments are appropriately sized, however, now that X-Plane returns 250 it all starts to go wrong. Obviously the author tried to do the right thing by query-ing X-Plane for what the count, but then it didn't do the appropriate steps to verify it can cope with that.

Monkeypatching X-Assign

Without source code provided by the X-Assign developer, the only real solution to a temporary fix is to monkeypatch the binary itself! After figuring out why it crashes, the solution becomes overriding the call to XPLMGetDatavi(). Here is the assembly for the whole sequence:

000000006cc81472 488B1D274E0100         mov        rbx, qword [imp_XPLMGetDatavi]
000000006cc81479 488B0DB02B0100         mov        rcx, qword [0x6cc94030]
000000006cc81480 FFD3                   call       rbx
000000006cc81482 4531C9                 xor        r9d, r9d
000000006cc81485 4531C0                 xor        r8d, r8d
000000006cc81488 31D2                   xor        edx, edx
000000006cc8148a 488B0DA72B0100         mov        rcx, qword [0x6cc94038]
000000006cc81491 4189C4                 mov        r12d, eax
000000006cc81494 FFD3                   call       rbx
000000006cc81496 488D1563BB0000         lea        rdx, qword [0x6cc8d000]	000000006cc8149d 4889F1                 mov        rcx, rsi
000000006cc814a0 4189C5                 mov        r13d, eax

The first two lines load the pointer to the XPLMGetDatavi() function and the dataref pointer for the call ("sim/joystick/joystick_axis_reverse"), afterwards executing the call. A bit further down the line is mov r12d, eax in which the return value of the call is assigned to r12. The same is repeated with a different dataref ("sim/joystick/joystick_axis_assignments") and the result is assigned to r13.

The thing we want to achieve is to load 100 into the RAX register and then nop out the calls. The opcode to load 100 into RAX is:

48 C7 C0 64 00 00 00

So we'll need space for 7 bytes. Luckily, right before the first call there is a 7 byte instruction already that loads the pointer to the dataref, but that is only needed since it's an argument to the function call and we can safely patch it with our own code. The next thing is to patch the call instruction, which is 2 bytes in the original. Luckily, nop only takes one byte (90 is the opcode), so we can easily patch it. What we'll end up with is:

000000006cc81472 488B1D274E0100         mov        rbx, qword [imp_XPLMGetDatavi]
000000006cc81479 48C7C064000000         mov        rax, 0x64
000000006cc81480 90                     nop
000000006cc81481 90                     nop
000000006cc81482 4531C9                 xor        r9d, r9d
000000006cc81485 4531C0                 xor        r8d, r8d
000000006cc81488 31D2                   xor        edx, edx
000000006cc8148a 488B0DA72B0100         mov        rcx, qword [0x6cc94038]
000000006cc81491 4189C4                 mov        r12d, eax
000000006cc81494 90                     nop
000000006cc81495 90                     nop
000000006cc81496 488D1563BB0000         lea        rdx, qword [0x6cc8d000]	000000006cc8149d 4889F1                 mov        rcx, rsi
000000006cc814a0 4189C5                 mov        r13d, eax

And there you have it, after those changes, X-Assign will work again.

Moving servers & domains

For years I have hosted all of my servers at uberspace, which are a fantastic company that makes hosting websites incredibly easy. At this point in time, I have 4 uberspaces running.

As you may or may not have noticed, this blog now sports a new look and also a new domain name! Under the hood, it also moved away from Uberspace and over to Linode. This change has been a long time coming, feresignum.com has been up and running as a staging server for almost 120 days now! Together with the blog I moved a bunch of other services that were running on my server as well as all of my private Github repos, which are now hosted on a private gitlab installation. This has overall decreased my server/services bills and also allows me to have much more freedom in terms of what I'm doing with my servers. Sometimes, super user rights are useful. Also, all domains are now served as HTTPS, which I'm really excited about.

For now, the old widerwille.com and the new feresignum.com are absolutely equivalent. Potentially I'll keep both of them, the main reason for the change is that I've been now living in an english speaking country for a while and widerwille is just hard to pronounce for most and just sounds weird. On the other hand, I have put my email address (ending in @widerwille.com) into a lot of email fields now and it wouldn't make sense to just discontinue that domain.

On the software side this no longer is a CentOS 5 installation, but instead it's a Ubuntu server. Still running Apache2, but it's now serving Ghost as blogsoftware instead of the discontinued #pants. Sorry Hendrik!

Sooo, does that mean more frequent blog posts? Maybe, I have a couple of posts that I'm eager to write.

Ultimaker modding

Long time no blog post, so I figured, I might as well write a blog post about something super boring: Modding my 3D Printer!

For quite a while now I’ve been running OctoPrint on a Raspberry Pi as my print server, as it’s much more convenient than shoving G-Code onto the SD card and then printing from there. However, this has one downside: I had to have the Pi close by the printer since they communicate via USB with each other.

Then, I found a mount on thingiverse that allows mounting the Raspberry Pi under the Ultimaker itself, and also optionally a relay that allows turning the printer on and off via OctoPrint. At first I started out without the relay, so I could at least have a nice spot for the Pi itself (which had been shoved rather awkwardly next to the printer so far). However, I ended up getting a relay shortly after and after modifying the mount a little bit to fit my relay I could turn my printer on and off remotely! What a time to be alive.

However, one tiny little issue remained, the Pi was powered via a USB cable which ran next to the power cable of the printer, and which was quite a bit shorter as well. This ultimately had the result of me ripping out the micro USB connector out of the Pi after pulling a bit too hard on the cable. However, the printer already has power coming to it, surely there is enough to power a Pi? The Ultimaker 2 was originally designed to power two extruders and two hot ends instead of just one, so they must have left at least some buffer there. And as it turns out, that is indeed the case!

I modified the Pi mount some more in order to fit a voltage regulator that would convert the 24V input voltage that the printer uses into the 5V required by the Pi and now I have the printer and Pi running off the same cable! No more hassle with a stray USB cable getting in the way, and if I unplug the printer, OctoPrint turns of as well!

The completed project

This the end result, it fits rather neatly under the Ultimaker.

Under the hood

This is the logic board with the 4 extra cables soldered onto it. It’s a bit messy, but not really because of any cables that I added but because of the way Ultimaker runs their cables.

Vero - True Social released

I talked about how I'm an evil person and part of the growing effort of destroying the web @hmans.io likes before (as if my recent Coffee machine post wasn't terrible enough). I haven't said what I was working on, or really, what the company I work for is working on, but now it's released and I can finally talk about it: It being a social network. Or better put, yet another social network.

Hold on, don't go away just yet, I believe it's actually a really cool social network (and I actually do, I've been drinking that kool-aid heavily). First of all, it's called Vero and it's primarily an iPhone app for now. The twist is that it's privacy oriented AND explicitly not monetized through advertising the selling of user data! This is really important to us, what's shared there stays there and is only visible to the people it's shared it with, no third party!

Also, it centers around privacy: You have three so called loops, Close Friends, Friends and Acquaintances in which you can categorize people and each of these loops can have a different avatar. So not only can you decide what to share with whom, but also how these people see you within the network. And then of course you can share what you are up to, give recommendations for places, movies, music, books etc, watch and listen to previews of shared content (eg. if someone shared a song you can listen to a preview of it, or watch the trailer of a shared movie) and easily access shared content through collections.

Yadda yadda yadda. But seriously, I really like it a lot.