[tlug] "Centralized" vs, "distributed" VCSs

Date: Sat, 21 Feb 2009 14:57:28 +0900
From: "Stephen J. Turnbull" <stephen@example.com>
Subject: [tlug] "Centralized" vs, "distributed" VCSs
References: <499A44AF.3010209@bebear.net> <87k57pqnoc.fsf@xemacs.org> <499B9B9C.9040202@bebear.net> <499E33B2.7040300@fremlin.org> <20090220064730.GN22190@lucky.cynic.net> <87r61t5yhp.fsf@xemacs.org> <20090220151752.GC1700@lucky.cynic.net>

Curt Sampson writes:

 > Actually, those with central VCSs use the VCS to merge, too. Every time
 > you do an "svn update," you're doing a merge.

Well, in some sense, yes.  To be precise, all of these systems use the
heuristic that files which are touched in only one parent can be
merged by taking the text of that parent as the merger (very
plausible), and similarly for individual hunks within files (not as
clearly reasonably safe, but works well in practice).  Real merging
has to be done by a human, though, including *but not limited to*
cases where the VCS signals a conflict.

The point is that DVCSes handle a lot more cases automatically than
SVN can, because they use past history intensively.  CVS can't do it;
it doesn't have atomic commits.  The SVN framework *can* do this
because it has atomic commits, but the actual implementation doesn't
do it (modulo rumors that SVN 1.5 does handle criss-crossing merges
better than earlier versions).

 > > and have the option of doing it centrally.  Those using
 > > legacy VCSes have no choice but to insist that the developers do the
 > > merging, not the VCS.
 > 
 > Huh? Some developer somewhere always has to do the merge. It's just a
 > matter of who does it, and where, when and how.

Well, in a DVCS, you can merge every other commit without errors.  (If
you have developers who do that, people who actually review the
history are obstructed by the merge turds and the complexity of the
history graph, of course, but the developers usually don't care.)  In
CVS or SVN, you don't dare do that; you'll end up with merge hell.
You must insist that the developers of the patches sync to the central
tree and then push (or send a patch).  (Actually you let CVS/SVN
require updates before *allowing* anyone to commit.)

On the other hand, Mercurial and Bazaar track which changes have been
applied to the trees being merged, and avoid various kinds of spurious
conflicts.  git doesn't actually do anything interesting, but
delegates intelligent behavior to the user.  git's incredible speed
and ability to edit the history DAG efficiently enables the user
transform the tree to more appropriate one, and then perform the
merger in stages.  (This is why people who don't get git don't get
git, ISTM.  They think this should be automatic.)

 > > "Centralized" refers to whether you have to push to communicate your
 > > changes to others or not.
 > 
 > Hm? Someone, somewhere, always has to push or pull changes to get them
 > into the main release (except for the guy who is committing to the main
 > release "branch" or "repo" or whatever you care to call it).

Of course, "the main release".  What distributed VC enables is general
networks of pushes and pulls, rather than restricting to a star
topology where everybody communicates via the central repository.  In
a distributed VC, you can just publish an URL and let the interested
come to you, and they can then merge your code with trees from other
sources conveniently.  In a legacy VC, to do a merge with any
convenience and degree of hope of success you need to be working with
branches from the same repo.

So in a DVCS you can leverage distribution to get a radically
different degree of decentralization from what's possible with a
legacy VCS, where you can decentralize development but not
administration and communication.

 > > Both terms also apply to workflows.  Typically the problem is that
 > > people confuse VCS (infrastructure == possible workflows) with
 > > workflow (== politically mandated).
 > 
 > True, though I'm not confusing them.

Well, no offense intended, but I think you're confusing something.

 > Part of my point, though, is that while both svn and git, say, can
 > support certain ways of working, for some of those ways svn is
 > rather more convenient to use that git.

Examples, please.  Preferably a case where it would make sense to
migrate from an existing git installation to a new Subversion
repository. ;-)

IME, legacy VCSes are "convenient" only because you actually have to
read the git manual and realize that 99% of the commands available are
irrelevant, whereas with svn the commands you've used since Tichy
published RCS still work the same.  (Figuring out the manual is
nontrivial with git, admittedly, and even worse with bzr, the
self-proclaimed "DVCS for the rest of us". ;-)  As you've pointed out,
Linux has a perfectly traditional centralized workflow into the Linus
mainline.[1]  The only thing that SVN does better than git AFAIK is to
force developers to get permission from The Management to publish
branches in easily mergeable form.  (Well, it probably works better on
Windows in the sense that the inherent brokenness of NTFS doesn't hurt
anywhere near as much amidst the pain of dealing with network latency,
whereas git has not been well-optimized to deal with NTFS and is
painfully slow, so-desu.)

Granted, if you already have a Subversion server installed, it often
makes little sense to migrate; many workflows won't be enhanced, so
it's a pure cost.  But "change is less convenient than doing nothing"
cuts both ways, it's just that Subversion has a large installed base
(and so far I have yet to hear of a case where a project even
considered migrating from git to Subversion!)

 > > And of those several dozen, how many have an URL to their current
 > > active repo that you can post so that I can pull from it?
 > 
 > All of them: anoncvs.netbsd.org. You don't even need to know who they
 > are; it's all in one convenient place.

*All* of the several dozen?  And what's convenient about it?  It
doesn't serve HTTP (even cvs.xemacs.org does that!), and I don't know
the CVSROOT so I can't even use checkout -c to get a list of modules.
I bet few of those several dozen have listed CVS modules with a brief
homepage[2] explaining the content of their private branches ... oh,
of course what you mean is that those private branches really *are*
private, I can't get access to their in-development trees (which are
probably kept in git, anyway ;-).  What I can get access to is the
patches they make the effort to publish, usually directly to the
trunk, right?  And even those who do keep branches in CVS, it's
probably non-trivial to merge a stale branch (with a feature I want)
to trunk.

Footnotes: 
[1]  It's just a whale of a lot faster than it could have been
implemented in SVN!

[2]  I'm referring to .git/description.  Cf. the second column of the
list you see at http://git.kernel.org/.

Follow-Ups:
- Re: [tlug] "Centralized" vs, "distributed" VCSs
  - From: Curt Sampson

References:
- [tlug] Call for presenters - March 14th technical meeting
  - From: Edward Middleton
- [tlug] Call for presenters - March 14th technical meeting
  - From: Stephen J. Turnbull
- Re: [tlug] Call for presenters - March 14th technical meeting
  - From: Edward Middleton
- [tlug] Re: Call for presenters - March 14th technical meeting
  - From: John Fremlin
- Re: [tlug] Re: Call for presenters - March 14th technical meeting
  - From: Curt Sampson
- Re: [tlug] Re: Call for presenters - March 14th technical meeting
  - From: Stephen J. Turnbull
- [tlug] "Centralized" vs, "distributed" VCSs
  - From: Curt Sampson

Prev by Date: [tlug] Root Access, Sudo, Etc.
Next by Date: Root logins versus sudo (was: Re: [tlug] Replacing the WM in Gnome 2.24)
Previous by thread: [tlug] "Centralized" vs, "distributed" VCSs
Next by thread: Re: [tlug] "Centralized" vs, "distributed" VCSs
Index(es):
- Date
- Thread

Home | Main Index | Thread Index