Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Open source license (wikipedia)



Darren Cook writes:

 > Again, people were saying the the impossibility of reconstructing
 > the original is the key.

I am quite sure that is wrong.  For example, in a highly optimized C
or C++ program you will be unable to reconstruct the original from the
compiled stripped executable (loops may get unrolled, dead code
eliminated, common subexpressions coalesced, etc, and of course with
the symbols stripped you won't be able to reconstruct variable names),
but there is no doubt whatsoever that the original copyright on the
source code persists in that executable.  (If you receive a program as
source code, there is an implied license to compile it for your own
use, but not to copy or redistribute the executable you compiled.)

 > Word embeddings [1] that use multiword expressions or n-grams might be a
 > more interesting grey area when "n" is high enough (because the text for
 > each embedding is stored).  (But I'll hazard a guess that n-grams up to
 > at least 4 or 5 is going to be okay.)

That's not the way this works.  It's not the number of words in an
n-gram; it's the number of n-grams that matters.  Even 1-grams are
hazardous if your corpus is the work of an author with idiosyncratic
spelling or frequent neologisms (eg, James Joyce).

An example is that there is an infosec tweep I enjoy following
(thegrugq), and somebody created a markov 'bot (thegrugq_ebooks)
trained on a corpus of thegrugq's tweets.  I had to look twice to
realize that a tweet that looked like the 'bot was actually a third
party because of the peculiar not-quite-English syntax.  I suspect the
third party was under the influence, but it really "looked like" the
bot.  And that's what matters.

Now, the FSF has a "15 line rule": contributions under that length
don't need an assignment.  But: that 15 lines applies to the *union*
of the contibutor's patches, *not* to *individual* patches.  So one
tweet is probably not enough to infringe the bot's copyright. :-)  On
the other hand, it's not obvious to me that with a few 1000 tweets by
now the 'bot can't infringe thegrugq's copyright....

Steve


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links