It Is Never a Compiler Bug Until It Is

nullc | 256 points

My first compiler bug was in my first year at Google. I'd just introduced a new system for the animation that updates your position while driving in Google Maps. It was perfectly buttery smooth as planned, except on my manager's commute the next day, where it constantly lurched back and forth. The others on the team were convinced that it had to be something with my code, but I didn't think it could be, because my code had no conditional statements and should either always be right or always be wrong.

It kind of looked like it was being fed nonsense speed values, so I got the GPS log from my manager and checked - but no weird speed values, actually a remarkably clean GPS log. Replayed his GPS on my phone - worked perfectly fine, buttery smooth. Eventually it came out that it only happened on my manager's phone. Borrowing said phone and narrowing things down with printf, I showed that my core animation function was being called with the correct values (a, b, c, d) but was being run with the wrong ones (a, a, c d). This is when my manager thought to mention that he was running the latest internal alpha preview of Android.

Searching Android's bug tracker for JIT bugs, I found that they had a known register aliasing bug. Honestly I have no idea how it ran well enough to get to my code in the first place. But I tagged my weird animation bug as related to that (they didn't really believe me) and ignored it until they fixed their thing, at which point it went away.

jtolmar | 3 years ago

Compiler bugs are indeed pretty frightening. A few years ago I bumped into one in some code that had potential to have a big impact. Unfortunately I am not at liberty to give details about the business setting except to say that we had processes in place that prevented any danger.

In the end I whittled it down to the following tiny C# program:

  namespace UhOh
  {
    internal class Program
    {
      private static void Main()
      {
        System.Console.WriteLine(Test(0, 0));
      }
      private static bool Test(uint a, uint b)
      {
        var b_gte_a = b >= a;
        var b_gt_a = b > a;
        System.Console.WriteLine(b_gte_a);
        return b_gte_a && b_gt_a;
      }
    }
  }
Compiling and running this with Microsoft's .NET stack with versions 4.7.0 and below, the output was incorrectly: "True, True" instead of "True, False". (IIRC, it also had to be a 64-bit Release build.)

The intermediate language was correct; it was a bug in RyuJIT.

ocfnash | 3 years ago

A couple of decades ago i worked with GSM handsets, and because you certify the GSM stack along with the compiler, you're pretty much tied to a specific compiler version.

At the time we were working on a new "feature phone" with a 160x120 pixel display in 4 shades of grey, which was a huge upgrade compared to our previous models. Another feature was full screen images for the various applications, and we'd been implementing is into the software and testing it for weeks without problem. After the development cycle came to an end, our release team created a new software release and sent it to our test department, which almost instantly reported graphical errors back to us. We tested the software image on our own handsets, and half the screen was "garbage".

We spent weeks looking over the merged code, as well as debugging pointers code and found nothing. It wasn't until we were stepping through the paint code with a Lauterbach debugger that we noticed something was "off" with a pointer.

The platform was a 16 bit platform and memory was addressed using a page pointer pointing to a 64kb memory page. When we traversed the bitmaps fate would have it that this particular bitmap, in this particular build, was split between two pages, and when incrementing a pointer beyond the page limit, instead of incrementing the page pointer, it simply just overflowed and started from the beginning of the current page.

Another interesting bug we chased in the compiler was it's inability to add more than 2 variables in initial assignment.

i.e.

  int a = 1+2;        // a = 3
  int b = 1+2+3;      // b = 3
  int c = 10 + a + b; // c = 13
That took a while to figure out.
8fingerlouie | 3 years ago

I've encountered one genuine compiler bug in my (now 14+ year) career.

I was working on a defense contract, on a government system, where I was constrained by local IA policy to specific versions of various tools, including a relatively ancient version of gcc.

I can't recall exactly what the problem was, but I do remember figuring out after doing some research that the bug that was biting me had been identified and fixed in a later version of gcc. Which I was not allowed to install. So I had to implement a hack-tastic workaround anyway.

One of the best parts about that job - I was integrating the tool I was writing (in Python with Tk - it was the only installed and approved GUI library I could use) with a really old signal analysis library that had originally been written for VMS back in the day - then ported to SPARC/Solaris - then ported again to x86 (yes, VMS heritage was evident in places). Through many years of flowing through many different maintenance contractors, the library had become a giant amalgamation of Ye Olde FORTRAN, C, C++, and Python. To build it I needed a specific version of the Intel FORTRAN compiler, which my employer would not purchase, and the client IA policy would not allow on their system anyway. With much hackery, I managed to coax the damn thing into building using the "approved" gfortran that was already on the network.

Egad, what a horrible job that was.

jcadam | 3 years ago

When something doesn't work as expected, I'll often check disassembly. That can massively cut troubleshooting time when something "smells" like a compiler bug.

This is why preferably everyone should learn to read assembler output. This is not limited to C/C++/Rust/etc. native code, the same output is typically also available for example for JVM and Javascript JIT.

Haven't found any miscompilations so far (unless you count braindead codegen), but quite a few hardware bugs. Including one CPU bug.

vardump | 3 years ago

Had a very junior engineer who had just started report a compiler bug to our all-technical-staff mail list. He was testing the not-yet-released next version (of tcl), so it was possible but we had appropriate skepticism and someone on list asked for the smallest reproduction case.

Few hours later, he verified and produced on-list a reproduction case where a variable could not be incremented by 1 but could by 2 or any other number.

Turns out he’d been taught in typing class that l (lowercase L) could be used for 1 and carried that into computing.

WONT_FIX

sokoloff | 3 years ago

Oh compilers are fun. Just recently I was reading through Rust's bug tracker, as one does, and learned that comparing function pointers is not deterministic. Compiling this code [0] in the Debug mode yields different results than in the Release mode. You can read the whole discussion about whether it's a LLVM bug, a Rustc bug, an undefined behavior, an intended behavior, a pretty serious bug, or nothing to worry about over at [1].

[0] https://play.rust-lang.org/?version=stable&mode=release&edit...

[1] https://github.com/rust-lang/rust/issues/54685

jiripospisil | 3 years ago

Since we are sharing stories about bugs we ran into in compilers...

I once ran into a bug where "bash" would run commands out of order. It wasn't hard to trigger the bug, but it wasn't deterministic either.

When I first noticed the bug on my production systems it drove me insane, since the logs being generated were impossible. It took me a weeks to figure out that bash was running commands out of order.

Then, when I tried to report this bug, I ran into a lot of resistance. First over IRC, nobody believed this could possibly be happening -- and I was eventually directed to the mailing list [0], where the maintainers were initially not able to replicate it, but eventually more required elements were identified and the bug was fixed.

[0] https://lists.gnu.org/archive/html/bug-bash/2015-06/msg00010...

rkeene2 | 3 years ago

Some years ago i worked on a part of a specialized steering system for a car. This was done with certified everything (Certified Compiler, Processor, a lot of paper work etc.)

This was a 16-Bit processor and the C-compiler had a "funny" bug. If you had a struct with 3 8Bit Values in a row and a 16Bit Value afterwards it would overlap the 8Bit Value with the 16Bit value:

  struct {
    int8 a;
    int8 b;
    int8 c;
    int16 d;
  }
In this case the variable c and d would have the same address. This was on a cpu where we didn't had a debuger (not enough memory left for it), we only had a serial port for debuging.
nuriaion | 3 years ago

My first was at my first job out of school. It was a bit of an adventure telling my manager. It was in C, but with an old GCC version on an architecture like MIPS. My code would seemingly never run through a switch statement correctly, but it worked fine with if statements. Luckily and unluckily, the company was large, ran a custom GCC build from a third party and had a support contract. When I filed the bug, they said "there's a known issue with large jump tables on that GCC version, disable some optimization with this flag."

I think that made me just a little paranoid. I generally trust things, but depending on their popularity and likely it is my code path is run by lots of users, I realize library (and compiler!) bugs happen.

dehrmann | 3 years ago

I have found bugs in Gcc, and reported them. I check on them once every few years to see if anything at all has happened on any of them. It seems worth distinguishing code-generation bugs from other compiler bugs. Most of my Gcc bugs are not code-generation bugs.

Back in the '80s, the C++ compiler was `cfront`. We spent half of every day bisecting source files to identify the line that would crash the compiler, and doctor it to step around the bug.

People who used to use the Lucid compilers said they were happy when Lucid flopped, because from then on their compiler only had known bugs, instead of a new crop every few months.

Things are better, nowadays, with compilers.

ncmncm | 3 years ago

The problem with compiler and standard lib bugs is it's the last thing you suspect. You're always going to look at your own code first, because 99/100 it's you and not them. You're never going to immediately think "compiler bug", your first port of call is gonna be "I must be using the API wrong".

I discovered a bug in the Swift standard lib once, and it took ages before I got to the point where I decided to strip out my own code, just to make sure it was me. And it wasn't, there was genuinely something wrong in the lib that other people on SO were also able to reproduce.

Good on him for finding a bug in secp256 too. When it comes to cryptography code, it can be very hard to know what the right answer is. I always find some examples on the internet and put them in a unit test to make sure I'm not misusing the API, because if you do your answer looks the same: bunch of numbers in a byte array. To know that your numbers are wrong, you need to be sure you are testing them correctly. Which you can't be if you don't know if you're using the API correctly.

lordnacho | 3 years ago

I got a fun compiler ignorance bug.

Me: I have memory corruption when I call your API. IBM: trust us, our API DLL is perfectly compatible with your old Windows 32 bit client program! We changed nothing! Me: I have stack overruns. 4 bytes of return value from you overwrite 4 bytes of variables, whatever I declare last in my function. IBM: look at the source of our API façade! It's unchanged! (it was, except for harmless additions). Me: your compiled code is fairly similar, but the return value is bigger. (At this point, I was already on very friendly terms with Ghidra and with the Visual Studio remote debugger.) IBM: we just recompiled our code!

But they recompiled it with a newer compiler: time_t had changed from 32 to 64 bits, changing the size of the returned unions in their DLL but not in my client.

HelloNurse | 3 years ago

> As I rushed to recompile my computer system using GCC 8, I contemplated the vast consequences of such a bug could be, and pondered how it was possible that computers could function at all.

This hits home.

read_if_gay_ | 3 years ago

I maintain embedded development C and C++ toolchains for a living. I have seen my share of compiler bugs. For example, some optimization pass in a popular open-source compiler that would lose track of dereferences of pointer variables if they were more than 12 bytes deep in the stack, meaning that a reference capture in a C++ lambda would get converted to a value capture if it was the third or later capture in order and changes to the referenced capture would be lost....

Anyway, my experience is that compiler bugs do exist, but maybe 99% or so of "compiler bugs" reported by my users turn out to be undefined behaviour in their code.

bregma | 3 years ago

Note also that "it's never a compiler bug" applies more to things like GCC and so on.

If you're working with a new language or quickly changing, e.g. Nim, Crystal, etc, or even something as old as Rust, then it can much more easily just be a compiler bug...

coldtea | 3 years ago

Maintaining an application that still has Symbian users (some people are really conservative and like their Nokia E52s, plus there isn't as much malware for this dead OS), bugs in old GCCE are rather annoying.

Sometimes, for no reason, GCCE just crashes compiling totally innocent code. Usually, a minor rewrite of the logic helps, or even weird edits such as adding a new (useless) parameter to a method.

The last GCCE toolchain for Symbian was released by CodeSourcery in March 2012. It contains GCC version 4.6.3. It is theoretically possible to adapt and compile a newer version, but the sources need so many edits that I gave up after a few days.

inglor_cz | 3 years ago

A couple of years back I ran into a JDK JIT bug during a project. The code ran fine until I ran it through benchmarks, which triggered JIT on a method causing it to return incorrect results.

Took a long time to find, because there were no errors, just wrong results (a specific if statement taking the wrong branch).

Trying to get assistance from others were mostly met with responses along the lines of "It's probably a race condition" (in single-threaded code) / "very unlikely to be a bug in the JIT". I did end up finding a way to disable JIT for the specific method, which solved the issue, and never got around to finding the root cause. I do believe it has been fixed in the meantime at least.

I haven't run into major compiler bugs since then, but often have to dive deep into libraries to find obscure bugs (database drivers and web servers most often).

matharmin | 3 years ago

Ah kids these days. New compilers used to be written every year or so, and had the most horrific bugs. For instance, the one that reordered complex 'if' conditions to evaluate in chunks, ignoring precedence. Or the compiler that stored parameter context while compiling with a different lifetime than the actual one - resulting in references to deleted memory during compilation. And on and on.

Used to be, a compiler bug was right up there with a memory issue in your list of 'what might be wrong'.

JoeAltmaier | 3 years ago

My immediate tangential thought is about Ken Thompson's paper, "Reflections on Trusting Trust".

The gist is that our security issues could come layers away from where we may expect them to, all the way up to the compiler. It's a great paper, but who would have expected anything less from Ken Thompson.

https://dl.acm.org/doi/10.1145/358198.358210

jjice | 3 years ago

I ran into a compiler bug in Clang by accident about 10 years ago.

The story is that I checked in a code change that passed unit tests locally, that then broke an automated build. This was bizarre because this was at Google and our check-in process guaranteed that we had run the test suites successfully, which required the very compile that broke.

It turns out that the local compile was with GCC, and the automated one was Clang. The construct that they treated differently went like this.

There was a class A with a protected property bar, and a subclass B. I also had unit test code in a class TestB which was a friend class to B. And in that unit test code I accessed foo.bar where foo was of class B.

GCC looked at that access, decided that bar is protected, foo is of class B, and TestB is a friend, so TestB has access.

Clang looked at that access, found that bar is protected and from class A, TestB is NOT a friend, so TestB had no access.

The problem went to a local expert who read the spec and decided that GCC was per the spec, Clang was not, and submitted a bug report to Clang.

As for me I figured that if the very first thing that I thought to try with protected, friend and subclasses was an edge case that nobody could agree on, perhaps C++ wasn't the right language for me...

btilly | 3 years ago

The most difficult bug of my career turned out to be a compiler bug. This was in the last 1980s. We were building autonomous robot navigation code using T, a dialect of Lisp. We were running the same code on Sparc workstations and an embedded system running vxWorks. The vxWorks code would intermittently crash with massive heap corruption, but only when certain devices were being used. The problem turned out to be two out-of-order instructions emitted by the compiler that decremented the stack pointer while a value in the current frame was still live. That value was read by the very next instruction, but if an interrupt happened to occur in between those two instructions, on vxWorks the interrupt handler used the stack of the current task, and so would clobber that live value. Add a few milliseconds more of run time and -- kablooey!

It took several months to figure out what was happening even though in retrospect it should have been pretty obvious.

lisper | 3 years ago

In the JDK 1.0.1, the GridBagLayout would make absolute hash of your layout. I found this out when attempting to write an application in early Java. I read and reread the API spec, I tried everything I could think of, but every time I got a scrambled mess on the screen instead of a nice layout. I was convinced that I had done something wrong because the first rule about bugs in the runtime is, it's probably not a bug in the runtime.

It was a bug in the runtime.

JDK 1.0.3 came out, and ran that code just fine.

bitwize | 3 years ago

Reminds me of an incident in 2015 where a colleague and I stayed late trying to figure out a strange case of our physics engine producing NaN values out of nowhere. Even when we found the culprit it took us so long to believe our findings. There was a bug in Visual Studio's intrinsics for certain geometric functions (I believe it was sin). There was even a bug report for it with a reply saying they were aware of the issue and had no immediate plan to fix it.

Agentlien | 3 years ago

In my personal experience you haven't really completely tested your software until you've hit a couple compiler bugs. :) Though they've gotten better in recent years.

The hits in the implementation of draft-brezak-win2k-krb-rc4-hmac-04.txt seem ... interesting.

nullc | 3 years ago

There is a compiler bug in GCC 10 that causes VLC to crash with some files but not others:

https://bugs.debian.org/971027#10

pabs3 | 3 years ago

Wow, this is like finding a glitch in one of knuth's books.

m463 | 3 years ago

Long, long ago I hit what presumably was a compiler bug in Visual Basic. Ran perfectly in the IDE, data corruption when compiled to disk. (I didn't have the experience to pin it down well back then.) Forget this, stay with the previous version.

I've also had a library + debugger bug that really left me pulling my hair out. Delphi, protected mode, the library dealing with real mode data. Running the code it would normally segment fault but occasionally work correctly. Single-stepping in the debugger would work correctly 100% of the time.

The library turned out to be riddled with the bug, they were using pointers to point to the real mode addresses. The mere act of loading an invalid pointer is a segment fault and the code emitted by the compiler would copy the pointers by loading them. Note that they were not being followed, if they pointed to nonsense it didn't matter, but if it wasn't a valid address, boom.

Somehow the debugger was successfully executing the invalid load when single stepping. I never investigated exactly what it was up to, I figure it might have been simulating the command to avoid having to write a breakpoint into memory that conceivably could have been read only.

LorenPechtel | 3 years ago

I found and reported a nasty compiler bug with basic arithmetic just a few weeks into learning C++. My previous programming experience was in BASIC and 6502 assembly, so it was my first experience with a compiled language. My bug report was accepted and the vendor issued a quick patch.

After this formative experience, it took me years to stop instinctually assuming that I was less error prone than the compiler.

sarah180 | 3 years ago

I once fought a Java compiler which did not produce the proper bytecode when I used the "@Loggable" annotation. (https://aspects.jcabi.com/annotation-loggable.html) Worst off, the incorrect bytecode was on the exception path. I spent at least 2 days making sense out of it.

anticristi | 3 years ago

It is scary that gcc or glibc manage to break memcmp or memmove on a regular basis.

I begin to understand the people who write their own libc for security reasons.

rkmark | 3 years ago

With a non-mainstream language, it is pretty much always a compiler bug

I use Pascal, because it is very fast and has automated reference counting (ARC) for most types, so it is almost memory safe. It was the only way to get C-like speed, fast compilation and no memory issues. With ARC you never get uninitialized values, you never get a use-after-free, and you never get a double-free.

A few days ago, I ran my program in valgrind's memcheck: double-free detected

That was hard to debug. Valgrind told me where the value was created, but not where it was freed the first time.

Turns out, FreePascal just put an unmatch refcount decrement after a string comparison: https://github.com/benibela/internettools/commit/4c510e8c977...

Time to update FreePascal. I use "nightly" builds of FreePascal because the stable version did not have Android aarch64 support (although the newest stable does support it. but there were other issues with the standard library). Last time I tried to update it, it stopped working on Android x86 and some floating point computations failed. But those issues have been fixed now.

--

Even worse than compiler bugs are CPU bugs. Or emulator bugs if you run it on an emulated CPU. I just had the problem that my app did not start on the aarch64 Android emulator. JNI ExceptionCheck always returned 1. This function here: https://github.com/benibela/internettools/commit/d9fafc9274a... Two instructions added to the assembly and it is fixed and returns 0, but those instructions should not have changed anything

benibela | 3 years ago

I hit a tricky compiler bug a few years back, only after the bad code had been released to the public. Fortunately the bug manifested as some wasted performance instead of anything too damaging.

The issue was that extern C functions didn’t have proper type checking of the parameters. The header said one thing, the cpp file something else, and they were very subtly different. The compiler didn’t complain. At runtime the caller would pass some values and the callee would get complete garbage, but only on 64-bit architectures. I tested on 32-bit at the time and never saw it myself until it was too late.

To be fair this one is mostly my own bug. I call it a compiler bug too because I expected the compiler to have my back and it didn’t.

cshokie | 3 years ago

This reminds me of a brilliant coder I worked with while in grad school. We were doing a lot of low-level kernel hacking for the SnowFlock project[0] although his knowledge and skills were (and I'm sure still are) significantly deeper than mine in this area. He had spent a significant amount of time on a particular bug and eventually started digging through the assembly code and found the cause was a compiler bug. Such a thing blew my mind at the time.

[0] http://sysweb.cs.toronto.edu/projects/1

michaelmior | 3 years ago

Like every 6 months something impossible happens on my visual studio. For example the program enters an if it shouldnt. It evaulates to false but still enters.

This happens because somehow it uses old cached intermediate files that are no longer valid. No matter what i do it refuses to build it correctly. Even clean and rebuild does not work, it forces me the delete project files and recreate them

It probably an err on our side, especially aince clean dowsnt work. But it is quite annoying when it happens and gets hard to figure out what is going on

shultays | 3 years ago

"It's not the compiler" is always a good first-approximation. And then bizarre things happen..

I ran into this with MSVC just a few months back. After an update of the compiler, a bizarre display issue emerged with lines criss-crossing the screen. Turning off optimizations to debug made the problem go away. Eventually tracked down the issue to the following line of code:

if (abs(delta.y) > abs(delta.x)) { …

changing this to:

bool yDominant = abs(delta.y) > abs(delta.x); if (yDominant) { …

fixed the problem. Yikes!

I don't know if they fixed the optimization in question by now.

Remnant44 | 3 years ago

We may have found a java compiler bug in high school cs class. The program was all of 2 lines doing some very basic arithmetic and comparison. Half the class and teacher were huddled around the computer trying to figure out what the hell was going on. I was new to programming so it's hard to say with confidence. But there was nothing 'tricky' in the code, just wrong output. This was over 10 years ago but java wasn't exactly new.

devthrowawy | 3 years ago

Back in 2015, GCC was already[0] over 14 million lines of code. It's surely much bigger now (although google didn't immediately provide a number). No one should be surprised when bugs crop up.

[0] https://www.phoronix.com/scan.php?page=news_item&px=MTg3OTQ

beervirus | 3 years ago

The only time I ever experienced a legit compiler bug was back in the heady days of Adobe Flex, where compilation would fail for no reason. After probably 2 days, I decided to just start making random changes. I added a bunch of white space to one of the many files and it started working again. Removed the white space - wouldn't compile. Re-added it, it compiled.

jugg1es | 3 years ago

When I used to do firmware, we used to see tons of bugs in the commercial compilers. We try out a new version and one bug was fixed and a new one inevitably pops up. My first boss used to do a diff between the assembly output of both versions for the same source code to identify the differences and determine if any new bugs were added.

pkaye | 3 years ago

I've only encountered a compiler bug once. During undergrad in a data structures course my Scheme code wasn't working correctly. Took a long time to convince the TA to even acknowledge the possibility of the problem not being my code. Felt such vindication at the time.

liminal | 3 years ago

I've encountered lots of compiler bugs. But I aggressively search for them, by various kinds of random testing. Before this became popular compilers were frightfully full of low hanging bugs, waiting for someone with a high volume test generator to flush them out.

pfdietz | 3 years ago

my only compiler bug was a doozy: pre-egcs gcc, on a strange MIPS chipset running an odd mix of at&t and bsd. the machine had been retired, and was being used (with blessing) as a host for a MUD.

it turns out that gcc was compiling with an opcode that was invalid on this particular architecture, causing the application to crash in weird ways. being pre-egcs, it was easy to track down and fix by changing the opcode to two instructions instead, fixing the problem.

it was then I understood why this very cool (to me) machine with a fast multi-cpu and high memory had been retired: it's hard to reliably run binaries that you compile that crash randomly, and I seemed to have been the only one to take the time to figure out why.

jerrysievert | 3 years ago

All of the compiler bugs I have run into were related to recently added features. Things like intrinsics for new CPU instructions, or new optimization options (e.g. ThinLTO).

I have a healthy skepticism of new compiler features as a result.

FartyMcFarter | 3 years ago

I found a compiler bug in Julia within my first 5 minutes of using it, and it took some courage to report it as a bug since I was so convinced it was my fault.

yuppiemephisto | 3 years ago

That's why I blacklisted gcc 9 and 10 for a while. Not just this one which I also encountered, but also several similar bugs.

rurban | 3 years ago

If prints don't work... Christ, I don't know what I'd do. Great post.

ASlave2Gravity | 3 years ago

Try writing in Nim ... at least one compiler bug per day.

jibal | 3 years ago

Two thoughts:

1. If anyone has significant experience or even just interest in this, they should work to collaborate on how compiler bugs could be taken advantage of for unintended purpose. Otherwise, there will continue to be a lot at risk.

2. You think compiler bugs are bad? How about hardware hacking, quantum effects, etc.

Similar to Schrödinger's cat, security is relative, never absolutely exists, but always exists. It's a Platonic ideal.

"Maturity" of understanding the nature of security is variable and multi-state, like a spiritual journey; greater understanding of security may lead onto to great loss of faith or you may go beyond into the light of true awakening. Most will only get a firewall, though.

studius | 3 years ago

Hi guys Watch the video where I masturbate in front of the webcam after a simple registration http://chatie.club/xxx Nickname anna1 ️

xxxxxann | 3 years ago