Discussion:
[tex-k] Reproducible builds using pdftex
Alexis Bienvenüe
2016-05-02 13:18:22 UTC
Permalink
Hello.

Working on the “reproducible builds” effort [1], we have noticed that a
lot of software packages rely on pdftex to build some documents
to be included in the binary package. Since revision 728 on pdftex's
svn, pdftex honours the SOURCE_DATE_EPOCH environment variable to get
reproducible values for the CreationDate, ModDate and ID fields in the
produced file. This greatly helps reproducibility.

However, a lot of software package date their documentation using the
`\today' command. This breaks reproducibility, and the document date
becomes the build date instead of the source files date as it should be.

Therefore, I would like to promote a feature for pdftex, that would use
(when set) the value of SOURCE_DATE_EPOCH to also feed the values of the
\year, \month and \day primitives.

Please find attached a patch that implements this feature, factorizing
the already present code from initstarttime to get and check the
SOURCE_DATE_EPOCH value, so that it can also be used in get_date_and_time.

Thanks in advance for considering including this feature in pdftex.

Regards,
Alexis BienvenÃŒe.
Karl Berry
2016-05-02 17:53:04 UTC
Permalink
SOURCE_DATE_EPOCH to also feed the values of the
\year, \month and \day primitives.

Changing the meaning of TeX primitives is a vastly different scenario
from changing embedded PDF info. I understand your need for it, but I
don't think it should be so easy to invalidate all existing TeX
documentation.

I understood from previous discussions that you don't want to change the
build processes to require tinkering with the TeX sources or adding
command line options. So I can imagine supporting a second envvar,
something like SOURCE_DATE_EPOCH_TEX_PRIMITIVES, which would need to be
set to enable these changes as well.

By the way, for your real builds, are you setting SOURCE_DATE_EPOCH to
the build date, or last mtime for the doc sources, or something? Not
actually zero, right? --thanks, k.
p***@passoire.fr
2016-05-02 21:09:55 UTC
Permalink
Post by Karl Berry
I understood from previous discussions that you don't want to change the
build processes to require tinkering with the TeX sources or adding
command line options.
That is the point, you're right. A lot of software packages are
concerned, and fixing all of them is a real hard work.
Post by Karl Berry
So I can imagine supporting a second envvar,
something like SOURCE_DATE_EPOCH_TEX_PRIMITIVES, which would need to be
set to enable these changes as well.
This could be OK.
However, I'm wondering:
- What is the advantage of dealing with the two envvars
SOURCE_DATE_EPOCH and SOURCE_DATE_EPOCH_TEX_PRIMITIVES instead of one?
- And what will be the difference? In our build process, I think we will
set both envvars to the same value for all packages to be built.
Post by Karl Berry
By the way, for your real builds, are you setting SOURCE_DATE_EPOCH to
the build date, or last mtime for the doc sources, or something? Not
actually zero, right? --thanks, k.
Actually, the last debian/changelog date, which is the last date the
source files were modified. This won't be zero.
That is: when building the software documentation, with the help of the
envvar, pdftex will behave as if this was called right after the last
source change.

Regards,
Alexis Bienvenüe.
Karl Berry
2016-05-02 21:28:21 UTC
Permalink
- What is the advantage of dealing with the two envvars
SOURCE_DATE_EPOCH and SOURCE_DATE_EPOCH_TEX_PRIMITIVES instead of one?

Mainly that it makes it harder to change the meaning of TeX primitives.
(People have to read more obscure doc, make another setting.) I think
that is highly desirable.

Beyond that, I can imagine, at least in theory, that a document might
look at (not typeset) the value of \year etc. for some reason, and want
to get its current value, even while using SOURCE_DATE_EPOCH to get a
reproducible build. After all, many manuals do not typeset \today.

Indeed, I can see an argument that tinkering with \year etc. is
subverting the stated intention of the document authors, not to mention
the 30+year history of those primitives, to typeset \today (\year etc.)
as something other than the current date/time.

In reality, using \today in a document expected to be rebuilt in
reproducible circumstances seems like a bug to me. We can work around
it with these envvars for practical reasons, but to me it still feels
like a bug, albeit of course one that no one would have thought about
before the whole reproducible stuff started. It's like calling
gettimeofday() or whatever and getting a result other than the current
time.

In fact, I can see another argument that if you need to pretend it's
three weeks ago, you should change the date on the system instead of
expecting every application to cater to your desires. But I won't go
that far :). Anyway ...

- And what will be the difference?

If you set them both to the same value, no difference.

In our build process, I think we will set both envvars to the same
value for all packages to be built.

Yes, that's what I had in mind would happen. -k
Alexis Bienvenüe
2016-05-03 07:30:23 UTC
Permalink
Post by Karl Berry
Beyond that, I can imagine, at least in theory, that a document might
look at (not typeset) the value of \year etc. for some reason, and want
to get its current value, even while using SOURCE_DATE_EPOCH to get a
reproducible build. After all, many manuals do not typeset \today.
The SOURCE_DATE_EPOCH envvar is designed to be used only in the context
of reproducible builds. In this context, if a document uses \year in
some way (not only typesetting it) that does change the result, we must
either modify the document itself to fix its behavior, either use
SOURCE_DATE_EPOCH_TEX_PRIMITIVES to fix \year's behavior. So I think
there can't be any case where we will want to set SOURCE_DATE_EPOCH but
not SOURCE_DATE_EPOCH_TEX_PRIMITIVES.

That said, I think that SOURCE_DATE_EPOCH_TEX_PRIMITIVES _is_ a solution
that covers our needs.
Post by Karl Berry
tinkering with \year etc. is
subverting the stated intention of the document authors
Producing the document in a version that corresponds to the author's
wish for the date the sources were last modified at don't seem disloyal
to me. Especially if we "date" this document with embedded PDF info
corresponding to SOURCE_DATE_EPOCH.
Post by Karl Berry
In fact, I can see another argument that if you need to pretend it's
three weeks ago, you should change the date on the system instead of
expecting every application to cater to your desires.
Yes, but :
- we won't be able to get the same date/time at the time pdftex is
called, even if we set the time at the beginning of the building
process, because of eg. CPU differences.
- the goal is also to allow everyone to check easily that the binary
package is the exact result of the building process from the sources,
thus with as little change in the environment as possible.

Regards,
Alexis Bienvenüe.
Norbert Preining
2016-05-03 07:50:45 UTC
Permalink
Hi all,
Post by Alexis Bienvenüe
there can't be any case where we will want to set SOURCE_DATE_EPOCH but
not SOURCE_DATE_EPOCH_TEX_PRIMITIVES.
I would suggest the following:
SOURCE_DATE_EPOCH is used to define the date/epoch and is evaluated
for pdf output etc
SOURCE_DATE_EPOCH_FIXUP_TEX_PRIMITIVES = 0|1
if non-0 then also \today is fixed.

It does not make sense to set the same value in two variables, and
complicates the code, but it *does* make sense to install an
additional safeguard not to redefine TeX primitives.

I attach a *suggestion* for a patch against current svn (some changes
have already been included in dvipdfmx, those are dropped).

All the best

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
Norbert Preining
2016-05-03 08:46:06 UTC
Permalink
Hi all,

concerning SOURCE_DATE_EPOCH_FIXUP_PRIMITIVES (previous email),
I have now tested the patch and it works as expected. Only if both
env vars are set the \today output is also rewritten.

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
Jonathan Kew
2016-05-03 09:09:22 UTC
Permalink
Post by Alexis Bienvenüe
Post by Karl Berry
Beyond that, I can imagine, at least in theory, that a document might
look at (not typeset) the value of \year etc. for some reason, and want
to get its current value, even while using SOURCE_DATE_EPOCH to get a
reproducible build. After all, many manuals do not typeset \today.
The SOURCE_DATE_EPOCH envvar is designed to be used only in the context
of reproducible builds. In this context, if a document uses \year in
some way (not only typesetting it) that does change the result, we must
either modify the document itself to fix its behavior, either use
SOURCE_DATE_EPOCH_TEX_PRIMITIVES to fix \year's behavior. So I think
there can't be any case where we will want to set SOURCE_DATE_EPOCH but
not SOURCE_DATE_EPOCH_TEX_PRIMITIVES.
That said, I think that SOURCE_DATE_EPOCH_TEX_PRIMITIVES _is_ a solution
that covers our needs.
Rather than hacking TeX primitives, how about simply inserting

\day = 1
\month = 4
\year = 2016

(or whatever) into the format used to build the document?

ISTM that it shouldn't be hard to prepare and use such a "fixed-date
format" as part of whatever package-building process needs to generate
documents in a "reproducible" way, even if they (ab)use \today etc.

JK
Norbert Preining
2016-05-03 13:51:19 UTC
Permalink
Hi Jonathan,
Post by Jonathan Kew
\day = 1
\month = 4
\year = 2016
While this is a possible idea, it still needs either fixing every document or makefile to use a different format, or requires that the build environment has different formats then the normal installed systems.

Both are not very convenient to day there least.

The idea is that a normal binary rebuild would give identical documents on a normal system.

(Putting Debian hat on) I guess even if these changes don't make it into TeX Live proper, I will patch the Debian sources to support these features.

(Putting TeX Live hat on) I do see actual use for us on TeX Live, too, as I am working to extend l3build to support checking not only log files, but also actual PDFs. This could be integrated (and I hope to do that) into package checks when they are updated in TeX Live.

So from my side in welcoming these patches.

Norbert
--
Sent from my mobile device. Please excuse my brevity.
Karl Berry
2016-05-03 21:46:55 UTC
Permalink
Producing the document in a version that corresponds to the author's
wish for the date the sources were last modified

But that is not the author's wish, in theory. If they use \today, by
definition they meant *today*, the day the document is being processed.

I certainly agree that most (though not all) authors using \today intend
"date of last source modification". But then they should not use
\today. \today means today. That is what we are subverting with these
envvars. This is why I think it should be considered a document bug and
(ha ha) in the fullness of time (sometime after the heat death of the
universe, no doubt) be eradicated. -k
Alexis Bienvenüe
2016-05-04 07:04:44 UTC
Permalink
Post by Alexis Bienvenüe
Producing the document in a version that corresponds to the author's
wish for the date the sources were last modified
But that is not the author's wish, in theory. If they use \today, by
definition they meant *today*, the day the document is being processed.
I agree with the theory.
Post by Alexis Bienvenüe
I certainly agree that most (though not all) authors using \today intend
"date of last source modification". But then they should not use
\today. \today means today. That is what we are subverting with these
envvars. This is why I think it should be considered a document bug and
(ha ha) in the fullness of time (sometime after the heat death of the
universe, no doubt) be eradicated. -k
Thank you for considering adding this bug. Without this bug, we would
have to get every \today replaced from every document source to get
reproducible builds. I prefer to delay this.

Regards,
Alexis Bienvenüe.
Norman Gray
2016-05-04 10:42:39 UTC
Permalink
Greetings.

A variety of hats have been donned in this conversation. May I put on a
pedantic hat?
Post by Karl Berry
But that is not the author's wish, in theory. If they use \today, by
definition they meant *today*, the day the document is being
processed.
Not really, or at least not by definition, or at least not by Knuth's
definition. The TeXBook says that \today is 'the current date' (p406)
and that \year is 'current year of our Lord' (p273); p349 mentions that
'\time, \day, \month, and \year are established at the beginning of a
job' (and says nothing more about what they are established as).

Certainly, the straightforward meaning of 'currently' is 'the date-time
in a correctly-adjusted Gregorian calendar when the job is started'.
But 'current' can mean a larger variety of things in a computing
context, such as 'current standard input' or 'current working
directory'. 'Current' in that sense refers to a value of standard
input, working directory, or indeed time, which is set by the
environment.

Thus I don't think that SOURCE_DATE_EPOCH_TEX_PRIMITIVES is strictly
required by any text in the TeXBook.

If I looked at one-line documentation for SOURCE_DATE_EPOCH I for one
would expect it to change \day, \month and \year as well. I would find
it confusingly, annoyingly and perversely inconsistent that it changed
some date information (in the PDF) but not others (\today).

If it is felt to be useful to distinguish the PDF information from the
\today information (and at a bit of a stretch I can see why you might
want to do that), then that would be reasonable functionality. But
perhaps the ..._TEX_PRIMITIVES environment variable should default to
'yes'. That is 'change some date information but not others' should be
regarded as the non-default behaviour.

Best wishes,

Norman
--
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Jonathan Kew
2016-05-04 12:21:50 UTC
Permalink
Post by Norman Gray
Greetings.
A variety of hats have been donned in this conversation. May I put on a
pedantic hat?
Post by Karl Berry
But that is not the author's wish, in theory. If they use \today, by
definition they meant *today*, the day the document is being processed.
Not really, or at least not by definition, or at least not by Knuth's
definition. The TeXBook says that \today is 'the current date' (p406)
and that \year is 'current year of our Lord' (p273); p349 mentions that
'\time, \day, \month, and \year are established at the beginning of a
job' (and says nothing more about what they are established as).
Certainly, the straightforward meaning of 'currently' is 'the date-time
in a correctly-adjusted Gregorian calendar when the job is started'. But
'current' can mean a larger variety of things in a computing context,
such as 'current standard input' or 'current working directory'.
'Current' in that sense refers to a value of standard input, working
directory, or indeed time, which is set by the environment.
Thus I don't think that SOURCE_DATE_EPOCH_TEX_PRIMITIVES is strictly
required by any text in the TeXBook.
If I looked at one-line documentation for SOURCE_DATE_EPOCH I for one
would expect it to change \day, \month and \year as well. I would find
it confusingly, annoyingly and perversely inconsistent that it changed
some date information (in the PDF) but not others (\today).
If it is felt to be useful to distinguish the PDF information from the
\today information (and at a bit of a stretch I can see why you might
want to do that), then that would be reasonable functionality. But
perhaps the ..._TEX_PRIMITIVES environment variable should default to
'yes'. That is 'change some date information but not others' should be
regarded as the non-default behaviour.
I'm still not seeing a compelling reason to tinker with existing TeX
primitives here.

Given a document "mydoc.tex" that (mis)uses \today:

\documentclass{article}
\begin{document}
This document was written on \today.
\end{document}

a user (or distro build process) can simply replace the command

pdflatex mydoc

with one such as

pdflatex \\year=2016 \\month=4 \\day=1 \\input mydoc

and the document will contain the specified date.

In a context where SOURCE_DATE_EPOCH is used, "date" can easily generate
the numbers needed:

pdflatex `date -r ${SOURCE_DATE_EPOCH} "+\\year=%Y \\month=%m
\\day=%d"` \\input mydoc

This avoids the need for any changes to the behavior of TeX primitives,
and keeps the date hackery firmly where it belongs: in the
reproducible-build distro's build setup.

Overriding additional timestamps in metadata produced by pdftex, dvips,
dvipdfmx, etc is a different matter, but that's a distinct issue that
doesn't involve TeX primitives; it's about pdftex extensions or separate
driver programs.

JK
Norman Gray
2016-05-04 12:54:59 UTC
Permalink
Jonathan, hello.
Post by Jonathan Kew
In a context where SOURCE_DATE_EPOCH is used, "date" can easily
pdflatex `date -r ${SOURCE_DATE_EPOCH} "+\\year=%Y \\month=%m
\\day=%d"` \\input mydoc
Cunning, cunning...
Post by Jonathan Kew
Overriding additional timestamps in metadata produced by pdftex,
dvips, dvipdfmx, etc is a different matter, but that's a distinct
issue that doesn't involve TeX primitives; it's about pdftex
extensions or separate driver programs.
If I'm following this thread correctly, then I think this latter point
is the original motivation, and the question is whether or not this
functionality should or should not be extended to so that the change of
date also applies to the 'current date' as expressed in the \year,
\month, \day and presumably \time primitives.

To be clear: I dispute the claim that defining SOURCE_DATE_EPOCH to
change \today would constitute 'changes to the behavior of TeX
primitives' (with its associated documentation burden). The TeXBook
doesn't specify what 'the current date' means, and while this of course
has a natural meaning of 'now, according to the clock', the text is not
inconsistent with an interpretation along the lines of 'now, according
to the environment'.

It would be necessary to emphasise this behaviour in the documentation
of SOURCE_DATE_EPOCH, but nowhere else. Perhaps something like:

SOURCE_DATE_EPOCH=xxx : set the effective date for pdftex, as manifested
in the values of CreationDate, ModDate, and others, in the PDF file.
Note that this value also constitutes the 'current date' for TeX, and
thus the derived values of the \year, \month, \day, and \time
primitives, and the \today macro.

Best wishes,

Norman
--
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Karl Berry
2016-05-03 21:46:56 UTC
Permalink
SOURCE_DATE_EPOCH is used to define the date/epoch and is evaluated
for pdf output etc
SOURCE_DATE_EPOCH_FIXUP_TEX_PRIMITIVES = 0|1
if non-0 then also \today is fixed.

Agreed. That's what I meant to propose yesterday, but confused myself
as I was writing.

Thanks for the draft patch. I'll look at it. It will take far more
time to test and document than anything else :(. -k
Karl Berry
2016-05-04 18:20:44 UTC
Permalink
Jonathan,

I'm still not seeing a compelling reason

As far as I know, it's only compelling for whole-distro reproducible
build process purposes. There's no need for a person writing a doc
intended to be reproducible in the first place to need any envvar
settings. That's why Thanh invented those other new primitives.

pdflatex \\year=2016 \\month=4 \\day=1 \\input mydoc

Changing the pdflatex invocation means changing the Makefiles (or
whatever) for the individual packages in nontrivial ways. That's no
easier than changing the documents themselves. And changing anything is
what they want to avoid. They want ("need") to set envvars and have the
whole build be reproducible without tinkering with each
package/document.

Overriding additional timestamps in metadata produced by pdftex,
dvips, dvipdfmx, etc is a different matter, but that's a distinct
issue that doesn't involve TeX primitives;

I agree, which is why I wanted a second environment variable to induce
that horrible should-be-unnecessary kludgery. -k
Karl Berry
2016-05-04 18:23:59 UTC
Permalink
If I looked at one-line documentation for SOURCE_DATE_EPOCH I for one
would expect it to change \day, \month and \year as well.

Norman, I understand your point, but I strongly disagree. In the
version that TeX Live ships, people will need to set a second envvar to
change the TeX primitives.

It is already highly frustrating to be going through this process at
after the last second. Please, let's stop the discussion so we can get
the work done.

best,
karl
Norman Gray
2016-05-05 08:11:31 UTC
Permalink
Karl, hello.
...this process at after the last second.
Ah -- apologies: I didn't realise the process was at such a late stage.
I can see my remarks were unfortunately timed.

Best wishes,

Norman
--
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Karl Berry
2016-05-04 23:20:40 UTC
Permalink
I'm sure I messed up somewhere, but I just committed Norbert's changes
(with minor modifications) to the texlive (r40889) and pdftex
repositories to implement SOURCE_DATE_EPOCH_TEX_PRIMITIVES. (I didn't
like "FIXUP" in the name; for one thing, it's the opposite of a fix.)
I also added some documentation to the pdftex manual and man page, and a
test.

Akira, I couldn't quite see if your changes conflicted with what I did.
Let me know if there are problems with your Windows compilation.

Please try it out if you can, and if it seems ok, I'll ask the TL
builders to make yet another final rebuild in a day or two.

Thanks,
Karl
Alexis Bienvenüe
2016-05-06 21:05:45 UTC
Permalink
Sorry for this late reply: I was away for two days.
Post by Karl Berry
I'm sure I messed up somewhere, but I just committed Norbert's changes
(with minor modifications) to the texlive (r40889) and pdftex
repositories to implement SOURCE_DATE_EPOCH_TEX_PRIMITIVES.
Great news!
Post by Karl Berry
Please try it out if you can, and if it seems ok, I'll ask the TL
builders to make yet another final rebuild in a day or two.
My tests work as expected.

Thanks a lot,
Alexis Bienvenüe.

Loading...