Conflating pointers with arrays: C's biggest mistake? (2009)

etrevino | 260 points

This is IMHO by far NOT C' biggest mistake. Not even close. A typical compiler will even warn you when you do something stupid with arrays in function definitions (-Wsizeof-array-argument is the default nowadays).

On the other hand, UBE (undefined, or unspecified behavior) are probably the nastiest stuff that can bite you in C.

I have been programming in C for a very, very long time, and I am still getting hit by UBE time to time, because, eh, you tend to forget "this case".

Last time, it took me a while to realize the bug in the following code snippet from a colleague (not the actual code, but the idea is there):

struct ip_weight { in_addr_t ip; uint64_t weight; };

const struct ip_weight ipw1 = {0x7F000001, 1}; const struct ip_weight ipw2 = {0x7F000001, 1};

const uint32_t hash1 = hash_function(&ipw1, sizeof(ipw1)); const uint32_t hash2 = hash_function(&ipw2, sizeof(ipw2));

The bug: hash1 and hash2 are not the same. For those who are fluent in C UBE, this is obvious, and you'll probably smile. But even for veterans, you tend to miss that after a long day of work.

This, my friends, is part of the real mistakes in C: leaving too many UBE. The result is coding in a minefield.

[You probably found the bug, right ? If not: the obvious issue is that 'struct ip_weight' needs padding for the second field. And while all omitted fields are by the standard initialized to 0 when you declare a structure on the stack, padding value is undefined; and gcc typically leave padding with stack dirty content.]

xroche | 6 years ago

The proposal here is way too vague. And if you flesh it out, things start to fall apart: If nul-termination of strings is gone, does that mean that the fat pointers need to be three words long, so they have a "capacity" as well as a "current length"? If not, how do you manage to get string variable on the stack if its length might change? Or in a struct? How does concatenation work such that you can avoid horrible performance (think Java's String vs. StringBuffer)? On the other hand, if the fat pointers have a length and capacity, how do I get a fat pointer to a substring that's in the middle of a given string?

Similar questions apply to general arrays, as well. Also: Am I able to take the address of an element of an array? Will that be a fat pointer too? How about a pointer to a sequence of elements? Can I do arithmetic on these pointers? If not, am I forced to pass around fat array pointers as well as index values when I want to call functions to operate on pieces of the array? How would you write Quicksort? Heapsort? And this doesn't even start to address questions like "how can I write an arena-allocation scheme when I need one"?

In short, the reason that this sort of thing hasn't appeared in C is not because nobody has thought about it, nor because the C folks are too hide-bound to accept a good idea, but rather because it's not clear that there's a real, workable, limited, concise, solution that doesn't warp the language far off into Java/C#-land. It would be great if there were, but this isn't it.

drfuchs | 6 years ago

Yes, that's C's biggest mistake. (But remember, they had to cram the compiler into a 16-bit machine.) No, "fat pointers" are not a backwards-compatible solution. They've been tried. They were a feature of GCC at one time, used by almost nobody.

I once had a proposal on this. See [1]. Enough people looked it over to find errors; this is version 3. The consensus is that it would work technically but not politically.

The basic idea is that the programmer knows how big the array is; they just don't have a way to tell the compiler what expression defines the length of the array. Instead of

    int read(int fd, char buf[], size_t n);
you write

    int read(int n; int fd, char (&buf)[n], size_t n);
It generates the same calling sequence. Arrays are still passed as plain pointers. But the compiler now knows how big "buf" is, both on the caller and callee side, and can check.

I also proposed adding slice syntax to C, so, when you want to talk about part of an array, you do it as a slice, not via pointer arithmetic.

The key idea here is that you can call old code from new ("strict") code, and strict code from old code. When you get to all strict code, subscript errors should be all checkable.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf

Animats | 6 years ago

I absolutely agree. Adding an array type to C that knows its own length would solve so many headaches, fix so many bugs, and prevent so many security vulnerabilities it's not even funny. Null terminated strings? Gone! Checked array indexing? Now possible! More efficient free that gets passed the array length? Now we could do it! The possibilities are incredible. Sadly, C is so obstinately stuck in its old ways that adding such a radical change will likely never happen. But one can dream ...

MrBingley | 6 years ago

Just for fun, type in this program:

    int fred(int a[10]) {
        return a[11];
    }
It compiles without error with gcc and clang, even with -Wall. The code generated by clang is:

    mov EAX,02Ch[RDI]
    ret
i.e. buffer overflow, even though the array size is given. Compile the equivalent DasBetterC program:

    int fred(ref int[10] a) {
        return a[11];
    }

    fred.d(2): Error: array index 11 is out of bounds a[0 .. 10]
And the 32 bit code generated (when using 9 instead of 11 so it will compile):

    mov     EAX,024h[EAX]
    ret
WalterBright | 6 years ago

Quite surprised to see this not mentioned. C99 allows you to use the "static" keyword in array function parameters like this:

    void foo(int arr[static 10]);
It cannot check whether a passed pointer will point to enough space, but the compiler can warn you if you pass a fixed-size array of a smaller size.
bluetomcat | 6 years ago

Apparently someone posted this here because of my remark:https://www.reddit.com/r/programming/comments/90ov9i/a_respo...

Nice to see it get such a nice response!

WalterBright | 6 years ago

From my experience Go's array (slice) is a far better solution. It does not only carry the size (number of elements), it also carries the array buffer capacity. To me it's the epitome of what arrays should be.

chmike | 6 years ago

Gimme a break, making stricter requirements on C arrays may theoretically make some things easier, but we’re talking 1% improvement. What makes C hard (and great) is requiring an understanding of not just memory, but memory allocation and deallocation schemes. For many beginners this is hard conceptually, but for everyone, keeping track of allocated and unallocated memory is extremely difficult.

speedplane | 6 years ago

I haven't written much C, and I don't have a firm opinion on whether or not that particular issue is C's biggest mistake. I do think that just this one change sounds radical enough, as far as the effort it would take to convert existing C code that uses the high-risk pattern, that it seems better to just wholesale convert to a language that already mandates safety like Rust or Java. Particularly when you consider all of the other high-risk patterns in C that these other languages eliminate.

ufmace | 6 years ago

This is a very good article that highlights the importance of semantics.

User23 | 6 years ago

Conflating pointers and arrays seem pretty minor and not the cause for many bugs.

The main source of bugs in C to me would be pointer arithmetics.

hota_mazi | 6 years ago

What's the mistake? You pass a pointer and the number of elements, it's just the C way. At any point in time you have to pay attention. What is the proposal here? Make all arrays structures? Or add some weird un-C syntactic sugar?

nearmuse | 6 years ago

Why is this such a serious issue? I mean it is inconvenient to always pass length along with the pointer but it's not that inconvenient. It's a bit more typing but that's where problems end.

bluecalm | 6 years ago
[deleted]
| 6 years ago

Agree that this is a problem (if the programmer is not careful).

But serious question, why even bother with this one fix?

The only reason for the fix is so to make it more difficult to make errors.

Fix arrays, then you would fix null pointer, then you might add objects, templating/generics to support a good collections library, rtti, and before you know it you are creating another one of c++, D, go, java. And we already have those.

C paved the way. Why not let it be the end of it?

altrego99 | 6 years ago

Isn't the fact that core types don't have a fixed representation a bigger mistake ? a char can be 16 bits, for example, aso.

toolslive | 6 years ago

Fat pointers manifested themselves in Pascal as strings and are still being used in modern Delphi.

nurettin | 6 years ago

I would love one day that programming should adhere to the discipline as in bridge/car safety. Simple malpractice will go to jail for it then there will be no argumment/discussion about this stupid mistake that can be verified by tool Cheers Pham

apz28 | 6 years ago

That's why I hope Red/System and just Red in general takes off https://static.red-lang.org/red-system-specs.html

xaduha | 6 years ago

Fair enough.

Arrays losing dimensionality when passed through functions is a pain every now and then.

robert_foss | 6 years ago

"C retains the basic philosophy that programmers know what they are doing; it only requires that they state their intentions explicitly."

The real 'mistake', is programmers not stating their intention explicitly.

grrrrrrrrrrrrr | 6 years ago

Is it better to pass the length of the array, or a pointer to the last valid address in the array? (or one past that) There's probably an advantage in the two types being the same.

Thought of this as I was reading the article.

flingo | 6 years ago

The mentioned Safe C Library is now at https://github.com/rurban/safeclib

rurban | 6 years ago

Fat Pointers - Pascal has had them since I think the beginning?

So...30+ years later, we decide Pascal was right. Just saying. Shoutout to FreePascal/Lazarus!

analognoise | 6 years ago

I feel like Sibiu should just write a new language that is C with fixes, and no more.

IshKebab | 6 years ago

I don’t understand what the hoopla is about: in assembler we deal with arrays by having to know the size of each field without giving it a second thought. The solution is to learn assembler first, then move on to C. And AWK, as the next generation C doesn’t have this problem, or any of the C problems.

Annatar | 6 years ago

Difference between Array and Linked List is enough to start confusion on pointers

known | 6 years ago

My guess is this won't be a popular post given the average age of HN participants.

There's nothing whatsoever wrong with C. The problem are programmers who grew up completely and utterly disconnected from the machine.

I am from that generation that actually did useful things with machine language. I said "machine language" not "assembler". Yes, I am one of those guys who actually programmed IMSAI era machines using toggle switches. Thankfully not for long.

There is no such thing as an "array". That's a human construct. All you have is some registers and a pile of memory with addresses to go store and retrieve things from it. That's it. That is the entire reality of computing.

And so, you can choose to be a knowledgeable software developer and be keenly aware of what the words you type on your screen actually do or you can live in ignorance of this and perennially think things are broken.

In C you are responsible for understanding that you are not typing magical words that solve all your problems. You are in charge. An array, as such, is just the address of the starting point of some bunch of numbers you are storing in a chunk of memory. Done. Period.

Past that, one can choose to understand and work with this or saddle a language with all kinds of additional code that removes the programmer from the responsibility of knowing what's going on at the expense of having to execute TONS of UNNECESSARY code every single time one wants to do anything at all. An array ceases to be a chunk-o-data and becomes that plus a bunch of other stuff in memory which, in turn, relies on a pile of code that wraps it into something that a programmer can use without much thought given.

This is how, for example, coding something like a Genetic Algorithm in Objective-C can be hundreds of times slower than re-coding it in C (or C++), where you actually have to mind what you are doing.

To me that's just laziness. Or lack of education. Or both. I have never, ever, had any issues with magical things happening in C because, well, I understand what it is and what it is not. Sure, yeah, I program and have programmed in dozens of languages far more advanced than C, from C++ to APL, LISP, Python, Objective-C and others. And I have found that C --or the language-- is never the problem, it's the programmer that's the problem.

I wonder how much energy the world wastes because of the overhead of "advanced" languages? There's a real cost to this in time, energy and resources.

This reminds me of something completely unrelated to programming. On a visit to windmills in The Netherlands we noted that there were no safety barriers to the spinning gears within the windmill. In the US you would likely have lexan shields protecting people and kids from sticking their hands into a gear. In other parts of the world people are expected to be intelligent and responsible enough to understand the danger, not do stupid things and teach their children the same. Only one of those is a formula for breeding people who will not do dumb things.

Stop trying to fix it. There's nothing wrong with it. Fix the software developer.

rebootthesystem | 6 years ago

(2009)

See also discussion from 9 years ago: https://news.ycombinator.com/item?id=1014533 (47 comments)

okket | 6 years ago

OpenBSD replaced strcat by strlcat, strcpy by strlcpy 20 years ago, in OpenBSD 2.4.

They are implemented in the C libraries for OpenBSD, FreeBSD, NetBSD, Solaris, OS X, and QNX.

They have not been included in the GNU C library used by Linux.

auslander | 6 years ago

Should have a 2009 in the title.

the_duke | 6 years ago

Why does the author of D want to "fix" C by changing it into D? Concentrate on that D language.

kyberias | 6 years ago

> Notable among these are C++, the D programming language, and most recently, Go

I would remove go from that list, and add rust and zig.

earenndil | 6 years ago

Meh. What do you do with the dynamically allocated arrays then? Do you pass their dimensions alongside pointer? If that bothers you so much, you can create a struct that holds the pointer and metadata, and do the checks yourself. Calling this "C's biggest mistake" is a bit sensationalistic.

EDIT: besides, you should start new projects in Rust anyway, because it takes security to whole other level. C did a great job, but it's a bit old. :)

Drdrdrq | 6 years ago