Discussion:
[tex-k] Error in TeXbook: Trailing tabs don't typeset
Evan Aad
2017-08-30 16:55:55 UTC
Permalink
I think I found an error in the TeXbook. Please vet.

Consider the following Plain TeX manuscript

---
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye
---

It's impossible to tell, but there's a tab just after the dot on the
third manuscript line. This manuscript typesets thus:

---
. .
. .
---

Observe that there seem to be two spaces between the dots on the first
line, but only one space between the dots on the second line.

However, based on the TeXbook, I'd expect there two be two spaces
between the dots on the second line too: one for the tab, and one for
the carriage-return at the end of the line.

One way to explain this is that discards trailing tabs before they are
tokenized. However, this would contradicts the TeXbook, which claims
TeX deletes any `<space>` characters (number 32) that occur at the right end of an input line.
Note the specification of the number 32, which is ASCII space.
Julian Gilbey
2017-09-01 08:17:06 UTC
Permalink
Post by Evan Aad
I think I found an error in the TeXbook. Please vet.
Consider the following Plain TeX manuscript
---
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye
[...]
Observe that there seem to be two spaces between the dots on the first
line, but only one space between the dots on the second line.
However, based on the TeXbook, I'd expect there two be two spaces
between the dots on the second line too: one for the tab, and one for
the carriage-return at the end of the line.
This is not correct: "Plain TeX makes <tab> act like a blank space."
says the TeXbook (page 45), so you have a blank space, and TeX enters
state S (skipping blanks). It now reaches the end-of-line, and "if
TeX is in state S (skipping blanks), the end-of-line character is
simply dropped." (page 46). So all that you end up with is a single
space character, as you observed.

Julian
Evan Aad
2017-09-01 08:39:31 UTC
Permalink
The first line, which sets <tab>'s catcode to 12 undoes Plain TeX's
settings, and the <tab> no longer acts like a blank space.
Post by Julian Gilbey
Post by Evan Aad
I think I found an error in the TeXbook. Please vet.
Consider the following Plain TeX manuscript
---
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye
[...]
Observe that there seem to be two spaces between the dots on the first
line, but only one space between the dots on the second line.
However, based on the TeXbook, I'd expect there two be two spaces
between the dots on the second line too: one for the tab, and one for
the carriage-return at the end of the line.
This is not correct: "Plain TeX makes <tab> act like a blank space."
says the TeXbook (page 45), so you have a blank space, and TeX enters
state S (skipping blanks). It now reaches the end-of-line, and "if
TeX is in state S (skipping blanks), the end-of-line character is
simply dropped." (page 46). So all that you end up with is a single
space character, as you observed.
Julian
Julian Gilbey
2017-09-01 09:02:31 UTC
Permalink
Post by Evan Aad
The first line, which sets <tab>'s catcode to 12 undoes Plain TeX's
settings, and the <tab> no longer acts like a blank space.
Oh, sorry, I overlooked that.

The reason is that the Web2C implementation of TeX has actually
changed the semantics from that of the original TeX. (That may have
been intentional or not, but it might be good if TeX itself could have
its own variant version which conforms to DEK's spec.)

The source is in web2c/lib/texmfmp.c, lines 2407-2409, in the middle
of the function input_line (which is a replacement for tex.web's
input_ln): they read:

/* Trim trailing whitespace. */
while (last > first && ISBLANK (buffer[last - 1]))
--last;

So instead of just trimming ASCII space characters, it trims all
blanks. Incidentally, ISBLANK is defined by:

#define ISBLANK(c) (isascii (c) && isblank ((unsigned char)c))

in kpathsea/c-ctype.h

Best wishes,

Julian
Post by Evan Aad
Post by Julian Gilbey
Post by Evan Aad
I think I found an error in the TeXbook. Please vet.
Consider the following Plain TeX manuscript
---
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye
[...]
Observe that there seem to be two spaces between the dots on the first
line, but only one space between the dots on the second line.
However, based on the TeXbook, I'd expect there two be two spaces
between the dots on the second line too: one for the tab, and one for
the carriage-return at the end of the line.
This is not correct: "Plain TeX makes <tab> act like a blank space."
says the TeXbook (page 45), so you have a blank space, and TeX enters
state S (skipping blanks). It now reaches the end-of-line, and "if
TeX is in state S (skipping blanks), the end-of-line character is
simply dropped." (page 46). So all that you end up with is a single
space character, as you observed.
Julian
Evan Aad
2017-09-01 09:01:29 UTC
Permalink
I will save y'all some trouble (or some fun, depending on your point
of view), and just state where the bug is.

The bug was perpetrated by Peter Breitenlohner on Thu Oct 16 20:39:27
1997 by Olaf Weber (citing Peter Breitenlohner), by reimplementing
Knuth's 'input_ln' function. The re-implementation is explained in the
code thus: "We define |input_ln| in C, for efficiency." But under the
guise of 'efficiency', Weber/Breitenlohner snuck in an alteration to
the function's semantics, which contradicts Knuth's specifications in
both the TeXbook and the TeX source code.

The affected files are

texk/web2c/tex.ch ... function input_ln
exk/web2c/lib/texmfmp.c ... input_line
texk/kpathsea/c-ctype.h ... isblank

Luckily, this is very easy to fix: replace the test "ISBLANK
(buffer[last - 1])" in input_line by "(buffer[last - 1]== ' ')".
Post by Evan Aad
The first line, which sets <tab>'s catcode to 12 undoes Plain TeX's
settings, and the <tab> no longer acts like a blank space.
Post by Julian Gilbey
Post by Evan Aad
I think I found an error in the TeXbook. Please vet.
Consider the following Plain TeX manuscript
---
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye
[...]
Observe that there seem to be two spaces between the dots on the first
line, but only one space between the dots on the second line.
However, based on the TeXbook, I'd expect there two be two spaces
between the dots on the second line too: one for the tab, and one for
the carriage-return at the end of the line.
This is not correct: "Plain TeX makes <tab> act like a blank space."
says the TeXbook (page 45), so you have a blank space, and TeX enters
state S (skipping blanks). It now reaches the end-of-line, and "if
TeX is in state S (skipping blanks), the end-of-line character is
simply dropped." (page 46). So all that you end up with is a single
space character, as you observed.
Julian
Julian Gilbey
2017-09-01 09:04:39 UTC
Permalink
Post by Evan Aad
I will save y'all some trouble (or some fun, depending on your point
of view), and just state where the bug is.
Ah, thanks, I just found exactly the same thing!

Julian
Karl Berry
2017-09-02 17:21:20 UTC
Permalink
Agreed that tabs should not be removed; we'll fix it in web2c.
Thanks.

Replacing the ISBLANK test with just a ' ' test is not right, it seems
to me. We must also remove trailing CR and LF characters, as we have
been doing since day one. Otherwise line ending portability madness
would ensue. In tex.web, Knuth can assume this is magically taken care
of by Pascal, but ...

Thanks again,
Karl
Evan Aad
2017-09-02 17:41:49 UTC
Permalink
Post by Karl Berry
Replacing the ISBLANK test with just a ' ' test is not right, it seems
to me.

Nor to me, but for different reasons, namely ISBLANK might be used
somewhere else, and even if it is not used elsewhere, someone might
decide to change it in the future according to their notion of what
"blank" is. Instead, change the function input_line by replacing the
test "ISBLANK(buffer[last - 1])" by "(buffer[last - 1]== ' ')".
Post by Karl Berry
We must also remove trailing CR and LF characters
I agree. While this was not explicitly mentioned in the TeXbook, I
think this is implicit in the description of the input algorithm in
chapter 8 of the TeXbook: this is why a carriage-return (ASCII 13) is
inserted at the end of every input line (see p. 46).

Victor Eijkhout would also agree, since in his "TeX by Topic" he
states in describing the input routine (p. 29): "An input line is read
from an input file (minus the line terminator, if any)."
Post by Karl Berry
Agreed that tabs should not be removed; we'll fix it in web2c.
Thanks.
Replacing the ISBLANK test with just a ' ' test is not right, it seems
to me. We must also remove trailing CR and LF characters, as we have
been doing since day one. Otherwise line ending portability madness
would ensue. In tex.web, Knuth can assume this is magically taken care
of by Pascal, but ...
Thanks again,
Karl
Individual User
2018-03-09 00:48:46 UTC
Permalink
Hi all,

I tested the new changes in TL 2018 in texmfmp.c

The result is pretty bad:

$ cat test.tex
\catcode9=12\relax% ASCII 9 is tab
.\ \ .\par%
.
.%
\bye

$ cat -A test.tex
\catcode9=12\relax% ASCII 9 is tab$
.\ \ .\par%$
.^I$
.%$
\bye$

The dvi file reads:
. .
.Ψ .

Where the Ψ character happens to be in slot \011, which is the code for TAB.


BTW, the object file names tex-tex-pool.o, tex-tex0.o and tex-texini.o
which are created during TeX build may be changed to tex-pool.o,
tex0.o and texini.o (because there is no point in this extra "tex-"
prefix and because analogous files for METAFONT do not have "mf-"
prefix).


Regards,
Igor

Loading...