Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Open-source repository question



John Fremlin writes:
 > Stephen J. Turnbull wrote:

 > > The assumption that two changes which conflict partially but are
 > > identical on the conflict are not in conflict.  David originally had
 > > this right but eventually bowed to pressure from the peanut gallery.
 > 
 > Why do you think one way is more correct than another and how does this 
 > relate at all to the theory?

The problem is precisely that *neither* way is globally correct; you
need to understand the semantics to know which answer is right.  For
example, you and I may both change foo.c, but in different ways (ie,
semantically independent).  However, we both notice the same typo
"hte" and correct it to "the".  This isn't a conflict, it's
independent invention, you know, the kind of thing that gets you sued
when a patent is involved.

OTOH, maybe we have this:

--------------------------------
char *identifiers[25] = {
    "double",
    "float",
    };

int next_open_slot = 2;
--------------------------------

and I change it to

--------------------------------
char *identifiers[25] = {
    "char",
    "double",
    "float",
    };

int next_open_slot = 3;
--------------------------------

and you change it to

--------------------------------
char *identifiers[25] = {
    "double",
    "float",
    "int",
    };

int next_open_slot = 3;
--------------------------------

OK, so let's merge.  I insert at line 2, you insert at line 4, no
conflict, right?  And semantically, entirely reasonable, no?  But we
both changed line 6, in the same way.  Original Darcs would throw a
conflict on that ... and be right.  'Cause now we get:

--------------------------------
char *identifiers[25] = {
    "char",
    "double",
    "float",
    "int",
    };

int next_open_slot = 3;
--------------------------------

Note that when the compiler (using this symbol table) parses

    int foo;

it will do

    insert_new_identifier ("foo", 3);

and your change will get overwritten at runtime (and the parser will
throw inexplicable errors on all future int declarations :-).  Bad
Darcs!  Bad bad Darcs!  (And all other existing VCSes, for that matter.)

 > The theory works fine provided the black box that determines whether 
 > patches conflict is "reasonable" -- I'm not sure of the exact conditions 
 > -- are you saying they are violated by this the choice of allowing 
 > patches that share a common implied patch to commute or not?

I'm saying that there is a theory which says two patches which change
the same place are in conflict, unless the wholes of the two patches
are identical, and Darcs followed that theory at first, but then
decided to go with the heuristic that this is usually something like
the typo case.

 > I don't see this at all, sorry for being so slow -- can you explain?
 > 
 > On a related note http://projects.haskell.org/camp/

Yeah, I haven't been following that, they decided to go their own way
and not publish on the Darcs lists.  But the occasional traffic on the
Darcs list suggests that camp is no better in the sense that it has no
semantic theory, and is entirely based on textual heuristics for
identifying conflicts.

 > > Of course git has to worry about conflicts.  If it didn't, there would
 > > be no need for git-rebase --continue and/or git-rebase --skip.
 > 
 > It doesn't worry about it, it let's the users deal with it and provides 
 > tools to help them.
 > 
 > Darcs is explicitly designed to worry about it.

No, it's not.  Darcs is designed to be smarter about what is and isn't
a conflict, but if it detects a conflict, Darcs throws the whole thing
into the user's lap just like all the others.  And in fact I don't
think Darcs really does much better than the heuristics build into git
(like "patches which don't touch the same file don't conflict").

Definitely Darcs *really* sucks in its conflict markup.

 > As the H.264 stuff used DMA, these two changes had to have separate 
 > branches (in fact as we used CVS, I actually just checked out versions 
 > of the source tree into different directories).

But this is precisely what Curt advocates, no?  Keeping it in a
workspace until it's ready to go into the mainline.  The point of a
branch is communicating (either with other developers or a forgetful
incarnation of yourself :-).  So I'm not sure where you're going with
this (unless you meant you *wished* you could branch but you were
stuck with CVS...).

 > How would Curt or Linus keep this in their heads?

Well, Curt says he wouldn't, and in fact he probably doesn't need to
from how he describes his preferred workflows.  Linus doesn't keep the
details in his head, but from what I've seen on lkml, he does
understand the implications of all changes to the code in a very deep
way.  So he can look at a patch and decide it sucks or maybe it can
work "as if" he really had all those details at hand.  I personally
have something of that.  I'm not a very good programmer, and
definitely really slow, but I can look at other people's patches to
XEmacs and "guess" how they're going to blow up without really
understanding the details of what they're doing.

But that kind of "intuition" only goes so far, even for Linus.  So he
(and we) need branching.

 > > Heh.  It's a lot easier than that.  Git is a specialized Lisp with
 > > annotated conses (commits), multiway trees (trees), symbols (tags),
 > > and hash tables with stable universal hashes (all objects).  All done! 

Really, I should have dropped the hashtable part of the analogy and
just called it "The Universal Obarray".

 > > Look Ma: no lambdas, and no parentheses.  I'm in heaven! :-)
 > 
 > But git's database is not a programming language so that doesn't make 
 > sense. The difference between Lispy languages and others is that in Lisp 
 > code is a singly linked list and can be fiddled with as if were data.
 > 
 > You could equally well claim that git was a specialised Python.

Well, you can claim that anything is a specialized Turing machine,
too.  But in the Lisp analogy there's something a little deeper than
that going on, I think.

What I was referring to is the recursive and cons-oriented style of
both Lisp and git *data*.  Even the "array" notation of git (HEAD~3)
is backward from all the others, and it's just an abbreviation for a
recursive notation (HEAD^^^).  Contrast with all the others, which
make it easier to refer to your mother by the number of generations
she's descended from Eve than by calling her "Mom".  It's very easy
for a Lisp programmer to wrap his head around what git is doing.

And the cheap branching of git: a branch is just a pointer.  Cheap to
bind, cheap to throw away.  Make a reference, and your stuff won't get
garbage collected.

It's not a really robust analogy, rather a patchwork of small
correspondences.  But there are a lot of them!


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links