Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] When is a line feed really a line feed?



David J Iannucci writes:

 > I'm no authority on this stuff, but I think that \n doesn't refer to an
 > actual character... I think it is an abstraction referring to whatever
 > is the line terminator used by the OS at hand (making the other guy's
 > statement somewhat tautological :-)

No, it refers to LF.  Indeed the "n" is probably supposed to be
mnemonic for "newline", but in every language I know of it means LF.
The language definitions (eg, ISO C, Python Language Reference, Emacs
Lisp Reference, ...) say so.

However, in files that are declared as "text" this will be silently
converted by the I/O subsystem to the platform EOL market.  That's
(mostly) why Unix doesn't need to distinguish text vs. binary files,
and (I would guess) why Mac doesn't use CR as a line terminator any
more.

It's possible that because of platform-specific I/O behavior the
interpretation you give is widespread, but technically it's incorrect.

 > The actual characters are CR (ASCII 13) and LF (ASCII 10).

In fact there is a whole pile of such characters, including CR, LF, NL
(IIRC ISO 6429 0x85), and Unicode LINE SEPARATOR (U+2028 or U+2029,
IIRC, the other one is Unicode PARAGRAPH SEPARATOR).

 > Mac uses only CR

Not since the introduction of Mac OS X, it doesn't.

Note:

In many modern environments, there is a "universal newlines" mode
(Python's name for it) which conforms more or less to UAX #9 (now part
of the standard) "The Unicode Line-Breaking Algorithm" regarding
parsing of newlines.  In summary, *all* of CR, LF, CRLF, LINE
SEPARATOR, and PARAGRAPH SEPARATOR are regarded as separating lines.
There are also a few relatively unusual characters which Unicode
doesn't assign other semantics to that act as line separators, such as
ASCII VT (vertical tabulation, ASCII 11) and ASCII FF (form feed,
ASCII 12).  However these do often get other semantics in
applications.  So gedit and Emacs also conform, by detecting the EOL
convention in use and displaying them as newlines.  Emacs at least
also treats VT and FF as line breaks, plus additional semantics in
some modes.

Output of newlines is still hairy, because most environments don't
come close to conforming to Unicode (which strongly recommends use of
the unambiguous LINE SEPARATOR for hard line breaks and PARAGRAPH
separator where you expect the software to provide appropriate line
breaks for you at display time).  So all user-friendly environments
convert to platform convention by default.  As Dave observes this can
be annoying because it's hard to see what convention is used in the
editor.

Emacs provides an EOL indicator in the mode line, and if you're worried
about mixed EOL conventions, you can specify the coding system as
"undecided-unix" to enforce Unix EOL, in which case CF displays as
"^M" in the buffer.  It becomes *really* obvious which lines have
which convention. :-)  While I can't necessarily recommend Emacs to
everybody, there's a very good chance that you usually use YFE[1], and
I've heard that YFE has a similar feature. :-)


Footnotes: 
[1]  Your Favorite Editor.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links