The Final Newline

The beautiful thing about the language of mathematics is that it is precise and definitions must apply to every case presented, 100%, else the definition is considered wrong and needs to be altered. 99% isn't good enough for a definition. If there is even a single case showing that the proposed definition doesn't apply to it, we consider the definition to be incorrect.

I'm a little surprised that in software engineering, which must be as precise as mathematics in definitions and specifications, we have come to accept this definition of a line:

A sequence of zero or more non-newline characters plus a terminating newline character.

As mentioned earlier, if there is even one case that doesn't satisfy the definition, then it's incorrect. This definition of a line completely falls apart when you look at a file with the following contents:

This is a line of text.

Let's save this as single-line.txt, without the newline character at the end.

Is this a line? Of course it is, humans don't use line terminators when writing something down in real life. Ask any person around you, they will say "yeah, it's one line of text".

Counting lines

Without a final newline character, POSIX compliant programs will fail to recognize this line:

> "This is a line of text." | wc -l
0

That's because the POSIX definition of a line is wrong. We have one line of text and the computer is telling us we have zero lines in our file.

Now, before you go and send pull requests to the authors of wc, let me quickly add: Their documentation explicitly mentions that the flag abbreviated with -l does - in fact - not count lines. 🤔

It only counts newlines. The authors of wc, Paul Rubin and David MacKenzie, probably wanted to avoid this controversy and the documentation states that it only counts newline characters, not lines.

Separator vs. Terminator

There are 2 ways to interpret the newline character:

With the "line separator" interpretation you basically treat the contents of a file as a

strings.Join(lines, "\n")

whereas the "line terminator" interpretation is better expressed as:

for _, line := range lines {
	w.WriteString(line)
	w.WriteByte('\n')
}

Definition vs. Regulation

As we have seen, not every line is terminated with a newline character. Thus we cannot regard the "line terminator" interpretation as a definition, because it's incorrect. It's a regulation.

A regulation is different because it forces you to do something as opposed to trying to define something.

Does the POSIX regulation make sense?

The thing about regulations is that we all hate them, but we hate inconsistencies even more.

Even if I personally think that the newline character should be a line separator because it's more in line (pun intended) with how humans see a text file, I also recognize that some fights aren't worth fighting.

It would have been a lot easier to discuss this topic in the early days of computing, before the existence of tools that adopted the POSIX standard. Now it's hard to argue against it when your co-worker just wants an easy way to disable the annoying "No newline at end of file" warning on every git diff.

So my suggestion to developers is that everybody needs to decide for themselves what is worth their time.

If you say that you want to argue with existing standards and try to improve them for the future, then I will fully respect that. But know what you're getting yourself into. Everything has an opportunity cost. I think it's mostly the younger generation that tries to initiate changes and that's not necessarily a bad thing.

Deep down, I really wish somebody would stop this line terminating madness someday. But this isn't my fight.

Conclusion