[tlug] Build Systems (was QQ - consider it a poll (TDD))

Date: Thu, 3 Feb 2022 12:28:41 +0900
From: "Curt J. Sampson" <cjs@example.com>
Subject: [tlug] Build Systems (was QQ - consider it a poll (TDD))
References: <m2y25ruc1v.fsf@sk.tsukuba.ac.jp> <c2eb1cbf-ade4-c623-6d6c-8845f09dc731@vortorus.net> <DM6PR11MB430080171F1E49DC81D47ADFDD949@DM6PR11MB4300.namprd11.prod.outlook.com> <Ya9QFKOzc2mizexl@logarithmic.cjs.cynic.net> <m25yrvy0bw.fsf@sk.tsukuba.ac.jp> <YfiDoNH1L5tSpJCS@logarithmic.cjs.cynic.net> <m2a6f9m8oe.fsf@sk.tsukuba.ac.jp>
Ah, Steve, you raise a lot of good points in your response: perfect for
helping me make my response. It's almost as if you're a plant :-)

And sorry about the length. No time to make it shorter, etc. etc.

On 2022-02-03 03:21 +0900 (Thu), Stephen J. Turnbull wrote:

> Curt J. Sampson writes:
>
> I don't recall pair-programming as being a requirement for:
>
>  > >  > regular get together and do real programming on real programs
>  > >  > on a regular basis.

Ah, I guess there are groups that "get together" only to chat for a few
minutes and then work alone. That seems to me not really getting together
to _program_ together, but just to have chat, end that meetup, go program
in a "non-meetup" style, and then maybe have another meetup after.

If you don't have two or more people looking at one screen/terminal/
whatever, I just don't see how that's a "get together to do ...
programming."

> But yeah, every year I sit down with a GSoC intern and do basically
> that for Mailman for about 20 weeks... I can't actually sit with
> them and type on their box as they talk, them being in Pune or
> Shanghai and all, but we share screens on Zoom sometimes (mostly IRC
> though, a lot of Indian students are off somewhere rural on <= 3G at a
> Starbucks in the summer).

I've fallen back to screen sharing when most of the programming is in a
non-graphical environment and we just need to share a bit of graphics,
but the case of "one of a pair can only watch" is a pretty bad bandwidth
limit on your communications, somewhere between "both parties can type"
and "can't see other screens at all; communicate in text chat."

When I hear "sit down with," I assume either actual sitting down physically
together in front of a computer or a session with voice chat, text chat
(for sharing URLs and suchlike) and shared terminals. I do it with some
tmate (https://tmate.io/) sessions, but VSCode or desktop sharing works ok
too. (Both of those are extremely low bandwidth solutions, much less than
sending out a video screenshare or even an audio stream.)

(As an aside, tmate has the advantage that each participant can arrange the
terminals as they wish in their own desktop, so long as they're able and
willing to keep each tmate window to at least the minimums size comfortable
for all participants. It has the disadvantage of requiring that everybody
have and be able to set up text-interface editor that they're comfortable
with, but you can do that with both Emacs and Vim and we all know that
folks who use TextMate or Sublime Text or worse yet an IDE (*shudder*)
aren't Real Programmers anyway. :-P)

I'm not saying here you _have_ to use tmate, but there are good, relatively
low-bandwidth solutions for this kind of stuff that are not huge burdens to
set up and learn.

The most common block to using those is not technical but an emotional
insistence on the part of developers with power in the project (or even
managers) that only certain tools are orthodox (I use that word
deliberately), such as a particular IDE, and everything must be done in
that no matter how painful. I've worked in shops where the developers were
working on server software that ran only on Linux, and yet developers were
allowed to use only Windows as their desktop platform, and contortions were
made to make this work. (Even to the point of some bits of the software
being built mostly on Windows, with the developers unable to do a build on
the target platform except via submitting code to the CI system and waiting
for that to get around to building it and mailing back any errors.) When
you've got issues like this, remote pair programming is only one of the
things you lose.

> Moving on, I'm quite surprised you contest *this*:
>
>  > > A lot of "real work" has to do with "our way of doing things" and
>  > > "how to get the source code," and that's not something you pick
>  > > up by trailing the core developers for whom "our way" is second
>  > > nature and "the source" is already in a local git repo.
>  >
>  > I'm not sure I see why you wouldn't pick that kind of stuff up from
>  > trailing a developer experienced with that project, so long as they're
>  > actually doing those things.
>
> Well, for starters, experienced developers almost never type or
> display the URL to the canonical repo *unless they're onboarding a new
> developer.*

Perhaps there's a misunderstanding between us about what "trail" means. I'm
not referring to a situation where the developer completely ignores the guy
trailing him and just goes on with his day as usual, but one where the
developer does show some of the bits of the process that he uses less
regularly (such as `git clone ...`) and pauses during work to explain
things and answer questions. Perhaps think of it as throwing bits of
"onboarding" material into the regular workflow.

And, as I mentioned, in my world designed around this something like "get
the repo" is deliberately a pretty simple thing: `git clone ...`, a command
line that can be pasted into a text chat or documentation file. It can't
always be this simple, but if it's more complex it goes into a
documentation file accessible in a repo browser or similar so that you can
reduce it to `Type https://gitlab.mycompany.com/foo/SETUP.md into your
browser` or something like that.

>  > The "how to get the source code" (and unmentioned but even more vital, "how
>  > to build the source code") bits are fairly easily covered by starting not
>  > with the checkout you've been working on for ages, but pulling down a new
>  > checkout on a fairly "clean" system (ideally, the trailer's system) that
>  > doesn't have all the random external dependencies already installed.
>
> Well yes, D'OH, that's *exactly* what I meant.  Is that part of your
> normal worflow?  Installing the source on a system you'll never touch
> again?  Except maybe as the typist in pair programming?

Well, yes. I do my tmate sessions in Docker containers (built and used with
dent[1]) so that I can give root access to participants and keep them out
of my private files), and since these get rebuilt on a regular basis, I'm
effectively frequently on a fresh machine. (This is great for keeping the
build debugged, as I've mentioned.)

[1]: https://github.com/cynic-net/contapps/tree/master/dent

Developers also have the ability merely to think about the likely effects
of their particular means of adding a dependency, and change how they do it
based on that. If they just added one that required finding obscure web
sites, downloading stuff from them, and going through a laborious install
procedure, they ought to be thinking about how the next guy's going to do
that and what they should put in the repo (scripts, even) to smooth this
out. Probably something that difficult should not be an "external"
dependency (something required to be installed outside the repo) but
something fetched and built as part of the repo's build output, in which
case it's tested with every fully clean build. (`make distclean && make` or
the moral equivalent might not be something you'd want to do hourly or
while you're coding, but it's easy enough to kick that off in a separate
clone of the repo when you come in the morning before you get on with the
rest of your work day and come back to that even hours later to debug any
issues with it. Or you can have your CI do that for you regularly, while
still using a non-distclean base for most CI builds to keep CI turnaround
fast.)

>  > In my world, all you need to do is `git clone ...` and then `./Test` in
>  > that repo, and that top-level build/test script will fetch and build any
>  > dependencies needed or at least inform the developer in some reasonably
>  > comprehensible manner what she's missing.
>
> I gather you're not working in C. ;-)

I am indeed working in C! Well, it's not my code, but I build it, and have
to deal with exactly those dependency issues. For example, I use the
linapple Apple II emulator in my 8-bit development, and the most convenient
way to deal with this dependency is just to fetch and build it. So I list
out the dependency tests with the Debian package names:[2]

    DEPENDENCIES = (
        ('git',                     ('git', '--version')),
        ('imagemagick',             ('convert', '--version')),
        ('libzip-dev',              ('pkg-config', 'libzip')),
        ('libsdl1.2-dev',           ('pkg-config', 'sdl')),
        ('libsdl-image1.2-dev',     ('pkg-config', 'SDL_image')),
        ('libcurl4-openssl-dev',    ('pkg-config', 'libcurl')),
        ('zlib1g-dev',              ('pkg-config', 'zlib')),
    )

[2]: https://github.com/0cjs/8bitdev/blob/5ec968039580d7a3df4f0bf0d062d5159c43a841/b8tool/pylib/b8tool/toolset/linapple.py#L31-L39

The code that uses this prints out the names of all the missing packages in
the form of an `apt-get` command you can run. Here I assume that Red Hat
users will be able to figure out the equivalent package names on their
systems, but this could easily be extended to record those too, if someone
felt it was worthwhile.

That's just one example; I've done plenty of build systems involving C code
(as well as many other languages, and combinations thereof) that have
demanded a wide variety of solutions. So long as I don't get stuck with
requirements such as "you may not write command line programs; every
developer interface must be available from a GUI" or similar silliness,
I've had few or no insurmountable issues. It might be fair to say that
there's not infrequently some cleverness, hard thinking or stepping out of
the box required, but generally this stuff can be done, and done well.

> Snide "a plague on all concerned" language bigotry aside, building the
> software is very little of what I'm talking about, except for people
> who are quite new to programming as well as to the project in question.

Huh. My experience has been that there are a lot of projects out there
where the build is really, really difficult. Difficult enough that you
might consider giving a competitor your source code in the hope of
crippling their productivity. But if you're saying that there are _also_ a
bunch of other things to learn, then yes, I do agree with that.

> For example, what do you do when your pull request triggers failures
> on buildbots for platforms you've never heard of?  (That doesn't
> happen on Mailman, we can't afford buildbots, it's a daily occurrence
> in Python.)

There are two situations where this can happen:

1. It happens to an experienced developer being trailed by a novice
   developer. The experienced developer explains what's going on and the
   novice watches how the experienced developer deals with the issue.
2. It happens to a novice developer working alone, and he's lost. You walk
   over to someone's desk or send a message in Slack or whatever, pair up
   with a senior developer to help you through the problem and deal
   with it more or less as above.

If the novice can't get help immediately, he's stuck and needs to send out
an e-mail or whatever and work on something else in the meantime. Or maybe
bang his head against the problem, if that's felt to be a useful approach.
(It might well be if the supply of senior developer time is extremely
constrained!)

> I gather you haven't paid much attention to how a Python venv works?

You couldn't be more wrong about that; I'm a religious user of Python
virtual environments. (I would have thought that to be kind of implied by
my general approach to dependencies.) My projects using Python are so anal
about avoiding external dependencies (aside from, for various reasons,
having Python interpreter itself available) that I don't even require you
to have pip available, and if you do have it I won't use it. See
pactivate[3] for the core of my standard virtual environment activation
process. (I'm sure that there are infelicities in that, but you get the
idea. I'm very happy to discuss separately any problems with pactivate or
ideas about how it could be improved.)

[3]: https://github.com/0cjs/pactivate

> (I'm talking about building Mailman, obviously Python itself can't be
> built in a venv.)

No, though I could see going so far even as to build your own Python
interpreter(s) within your project, in some very particular circumstances.
To date I've found that pythonz[4] and random OS installs of Python get me
by, though.

[4]: https://github.com/saghul/pythonz

> I admit my experience is a small sample, but even Ghostscript (I mean,
> a fscking ACM Fellow retired early on the cash flow from that project,
> he could afford the best workflows known to man) didn't work that way.

As I hope I made clear above, being able to _afford_ a good workflow is
almost never the issue. It's almost invariably something else, something
cultural, that's stopping any particular project from having a better one.

> God knows neither Emacs nor XEmacs works that way.  Both configure
> conservatively (I think you need a mininum of libc and either curses
> or X11 installed), and trust that you've installed libraries and
> headers for any options you request in advance.  (Of course configure
> bitches in plain language if you didn't, but it's still your problem
> to get and install them if you want them.)

Right. There's two things going in here that I'd like to unpack.

First, "bitching in plain language" is a perfectly reasonable option to
consider. Look at the audience: anybody actually building Emacs from source
can be expected to have some knowledge of software development; if they
don't have that you tell them to go install it using their system package
manager. (If they need something the system doesn't make available, they
need to switch systems or enter a world where they need to learn something
about software development; I don't see any way around that.)

Where any particular dependency for a project sits on the spectrum from
"print out a message the developer will probably understand" to "we
reliably handle fetching and building it" depends on the project, the
dependency and the audience, and solutions may even work on multiple
levels. Consider my example above. When my build system prints out
something like:

    ----- linapple: libzip-dev test failed: pkg-config libzip
    Missing dependencies. Try: apt-get install libzip-dev

for a Debian/Ubuntu user that's a "here's the command to fix this" message
whereas for a Red Hat user that's a "here's a clue that should get you to
the command(s) you need" message. That's the right thing to do in light of
the current audience for that repo and the development resources available.
That could change in the future. (In fact, it will: all that custom local
build stuff is going to get replaced with Nix builds at some point.)

> Python OTOH builds in all the options it can find on your system.  If you
> haven't installed them, you won't get them.  It only complains about a
> couple of them, though.

That brings us to the second point: there actually are (or should be) what
are at least conceptually two separate build systems here. One is for
people who want to build a working version of the software to use on their
system, and the other is for developers who want to test that the software
can be built on a wide variety of systems in a wide variety of
configurations. For the former it's fairly reasonable to "build with what
you have." For the latter you want to make the developers/testers make
available the dependencies you want to test. So obviously you're going to
go about that in different ways, needing at least two "modes" in your build
system.

>  > The biggest issue with most build systems, I feel, is that when
>  > something's missing developers often do whatever local hack is
>  > needed to get things working...and then move on,
>  > without ever getting that information back into the repo itself.
>
> That's not been my experience, but most of my experience (except for
> Mailman) has been with applications that need to work on every system
> known to man (and some known only to woman).

Yes, open source projects designed to be built by a lot of people other
than the developers are usually not going to end up in the state I
described above. That's more likely to happen with proprietary projects
where having a "the build is your problem" attitude isn't able to drive
customers (of the build) away.

> Any given developer isn't necessarily going to know where you're supposed
> to get certain things (eg, Homebrew on Mac supplies much less than
> MacPorts or whatever it was I was using before MacPorts, so a port-based
> developer isn't going to be able to tell a Homebrew-based developer what
> to do with much certainty). And it's not so much local hacks that cause
> problems, it's platform-specific handling (which may or may not be hacks)
> that have to be ifdef'd to the 5th circle of Hell and back.

Yeah, but if you're adding #ifdefs to your code (OMG! I'd never realized
that we've been hashtagging our code for years! #cool #weweretherefirst)
you're very much starting to put this information in the repository, rather
than keeping it as local changes on your particular development system.

>  > As for "our way of doing things," that's exactly what you see as
>  > you follow someone doing actual work on the code, is it not?
>
> ... And when it comes to style, "our way" may not be exactly
> visible to the newbie.

Again, I was taking "trailing" to mean that the experienced developer is
pointing out and explaining things and answering questions as he goes
along. I agree that a "newbie just watches silently" situation is not going
to be hugely effective. But certainly if you trailed me for any length of
time we'd almost certainly run into some sort of formatting situation worth
comment, and that would be the point at which my views on formatting, why
we don't always follow the formatting rules, and so on would start to come
out. The newbie doesn't need to appreciate all the subtleties of this; it's
sufficient to learn that there's stuff going on there that he'll have to
learn over time and that he needs to keep an eye out, when working alone,
for situations where he should go seek advice.

It's not as if he's ever going to be at the point where he now knows the
correct formatting for everything, anyway. (Except in certain projects
where they prioritize not thinking about formatting over actually having
the most readable code.) He's going over time to develop an understanding
of how it works now and what forces brought the project to doing it that
way and then start participating in discussions about how things should
continue to be done or change in the future. (This is true of most
decisions in a project. "Do we print package or library names in an 'X is
missing' message?" is also something that may change over time and an
experienced developer should understand the reasons and forces that brought
you to where you are now, not just know the particular way something
happens to be done.

> There's a deliberate policy against PEP8ification of working code, both
> to prevent code churn (including implied changes to test suites) and to
> avoid introducing inadvertent changes in behavior.  So your newbie is not
> going to "get" PEP 8 unless *taught* PEP 8 because they're looking a lot
> of stdlib code that isn't PEP 8 conformant, and they're not going to get
> the anti-churn policy unless taught that, too.

Right, that's a great example of the sort of thing that, if it is
encountered often enough, will be encountered while trailing and the newbie
can be (partially) illuminated at that point. Even if that particular thing
doesn't come up, enough situations of various types should come up that the
newbie will start to get a sense that there are subtleties lurking behind
"rules" and learn that generally he needs to seek advice on things entirely
new to him before blindly plunging in to make major changes.

> Customs around code ownership...
> [A core developer] just won't mess with a prickly maintainer's
> code without clearing it in advance, and so probably not at all in a
> routine day's work.

Right. That seems to me exactly the sort of thing you'd be likely to
encounter with more than a trivial amount of trailing, at which point
surely the developer would explain why he's making the decision not to
touch some code or seek advice or whatever, right?

> A lot of that doesn't apply to smaller projects most of the time, but
> I've seen people tripped up in Mailman (3-7 active core devs over the
> last decade) by pretty much everything that you'd think would mostly
> be "big project" problems.

I don't see these as "big project" problems. The smaller projects are
dealing with the same core problems; they just (almost naturally, due to
being smaller) usually have much better communications within the project
that gets these things resolved more quickly. Often, with experienced
developers, so quickly that you probably don't even see it as having
encountered and solved a problem.

cjs
-- 
Curt J. Sampson      <cjs@example.com>      +81 90 7737 2974

To iterate is human, to recurse divine.
    - L Peter Deutsch
Follow-Ups:
- [tlug] Build Systems
  - From: Stephen J. Turnbull
References:
- Re: [tlug] QQ - consider it a poll (TDD)
  - From: Curt J. Sampson
- Re: [tlug] QQ - consider it a poll (TDD)
  - From: Stephen J. Turnbull
Prev by Date: Re: [tlug] QQ - consider it a poll (TDD)
Next by Date: [tlug] [Python-Dev] PyPy on PySide6 is there: PyPy with a Gui
Previous by thread: Re: [tlug] QQ - consider it a poll (TDD)
Next by thread: [tlug] Build Systems
Index(es):
- Date
- Thread
Home | Main Index | Thread Index