Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Adding text to the beginning of a file



Dave M G writes:

 > If I change it to:
 > cat file_with_new_data tmp_file >> end_result_file
 > 
 > ... then it adds the contents of file_with_new_data to the *end* of 
 > tmp_file. This is better, but actually, what I want to do is stick the 
 > data on at the beginning. Prepend, not append. I looked up command 
 > options for "cat" but couldn't find how to do it.

[I license the following content under the Academic Free License,
v. 3.0 or later.  Ie, do what you want, as long as you don't sue me.]

Don't mistake the hammer for the workshop.  Unix tools are designed to
do one thing cleanly.  To do a job, you typically need to use several
tools.  However, because each tool is very simple and regular, it is
possible to line them up as an automated assembly line, which is
called a pipeline.  OTOH, an interactive program is like a
full-equipped workshop.  It can do anything without going out to get
another tool, but only if the master is at the workbench.

The fundamental idea here is the concept of a stream, which is just a
sequence of bytes.  (You will immediately recognize that the name
"pipeline" derives from the fact that a stream flows through it.)
Processing streams is like eating nagashi-soba (or kaiten-zushi): you
grab some of the data as it flows by and transform it.  (Here the
kaiten-zushi analogy fails: in a stream the data never goes by again.)

Fire up a terminal window, please.  Then type the contents of the
lines starting with "$" (omitting the "$") after me:

$ echo abc
abc
$ echo abc > letters
$ echo 123 > digits
$ 

The lines not starting with "$" are output.  The "$ " is a prompt
provided by the shell; yours may differ in various ways.

Now, what happened in the example?

There are three special "studly" streams always available in a Unix
program: standard input (aka stdin), standard output (aka stdout), and
standard error (aka stderr).  stderr is special, and will not be
treated in this sho-tutorial.

In the example above, you've already met stdout several times.  stdout
is aliased by default to terminal output, so anything written to stdout
appears on your terminal.  The command "echo" does exactly that; it
echoes its arguments to stdout.  Thus echo is a way of getting "stuff"
from the command line into a stream.  When you type "echo abc", echo
takes the argument "abc", writes it to stdout, and it appears on your
terminal.  Echoes cleans up by writing a newline to stdout.  That's
why the result of "echo abc" is not "abc$ ", with the next prompt
glommed onto the end of echo's output.

Streams can be redirected.  The ">" operator is preceded by a command
and followed by a filename; it takes the stdout of the command and
writes it to the named file.  So the example first echoed to the
terminal, then created two files.

To get "stuff" from a file into a stream, use cat.  What cat does is
to take several named files and concatenate them into one stream by
copying them, in the order given on the command line, to stdout.
Let's see what's in letters and digits:

$ cat letters digits
abc
123
$ cat digits letters
123
abc
$ cat letters digits > characters
$ cat characters
abc
123
$ 

So to prepend new to old, you merely need to put new before old in
cat's arguments.  (Thus cat is more powerful than the shell's ">>"
operator.)

cat has some special cases which need to be considered carefully.
Unlike echo, which always discards stdin, cat will take input from
stdin.  stdin is also normally aliased to the terminal, so it is taken
from terminal input.  Try entering the command "cat" followed to Enter
to execute it, then typing "456", the Enter key, then Ctrl+D.  You
should see

$ cat
456
456
$ 

This behavior is not terribly useful, but combined with redirection it
can be used as a (very!) primitive editor.

stdin can also be redirected.  The "<" operator is like the ">"
operator, except that it binds stdin to the existing contents of the
named file.

$ cat > moredigits
[4][5][6][Enter][Ctrl+D]
$ cat moredigits
456
$ cat < moredigits
456
$ cat letters digits
abc
123
$ cat letters < digits
abc
$ 

The words in square brackets in lines not preceded by "$" are key
names.  Thus "[4]" means "push the key labelled '4'" and "[Ctrl+D]"
means "push the Ctrl key and the D key at the same time."

>From the "cat moredigits" examples, it looks like the first filename
and stdin are related, but that is not the case.  We can see from the
letters digits examples that actually cat discards stdin if there are
filename arguments.  It's probably better to think of this in the
opposite way: cat's use of stdin is a special case, ie, only when
there are no arguments, it takes its input from the single stream
stdin.

It is often useful to concatenate stdin and streams taken from files
in the same command.  If cat did this automatically, you'd be in the
same situation as you found with the ">>" operator: sometimes you
can't get there from here.  So what is done is to have a special file
name, "-", which most Unix tools interpret contextually.  If an input
file is wanted, "-" refers to stdin, while if an output file is
wanted, "-" refers to stdout.  Thus

$ cat - letters
[4][5][6][Enter][Ctrl+D]
456
abc
$ cat letters -
[4][5][6][Enter][Ctrl+D]
abc
456
$ 

The pipe operator is also a redirection.  It binds the stdin of the
right hand command to the stdout of the left hand command:

$ echo def | cat
def
$ 

This is not the same as

$ echo def
def
$ 

In the pipelined case, cat actually does the (useless) work of copying
bytes from stdin to stdout.  If you want to avoid the redundant work,
just omit the "| cat"; the shell or OS won't do that for you. :-)

Note that the semantics of stdout redirection to a file is a matter of
life-and-death import.  Placing ">" in a command is a death sentence
for the named file it controls, even if the command is an error.  But
if the command succeeds, a new file is born, containing the stdout of
the command.  There are ways to tame this Rambo-esque shell, but this
violent behavior is useful enough to be default.  Go-chuui shite
kudasai.  Pipes and stdin redirection, on the other hand, have no
persistent effects.  When the program exits, it's as if they were
never there.  Go-anshin shite kudasai.

Note that I have described "|", ">", and "<" as operators.  Pipes and
redirection are features of the operating system, and are available to
all programs.  They are given different "names" in different languages.
In C redirections are stdio function calls.  In C++ redirections are
stream operators.  In both pipes are (IIRC) stdlib function calls.
In the shell, they are all operators, with the names "|", ">", and "<".

I mention this to point out that confusion about these concepts is
pretty natural.  The shell interprets the characters by setting up the
program's environment in certain ways.  However, the actual operation
of redirection is not done by the shell, but by the OS.  This implies
certain limitations that are non-intuitive if you think of the shell
as "doing" redirection.  It doesn't.  It tells the OS, which actually
does the work.  This means that ">>" cannot mean both append and
prepend; you must pick one.

Also, cat never sees the redirection operators, so it is impossible
for an option to cat to "work" (ie, select appending vs. prepending).
Consider again

$ echo abc > letters
$ cat letters
abc
$ 

Where did the "> letters" go?  echo is supposed to copy its arguments,
shouldn't it be in the file?  The answer is, no, ">" and "letters" are
not arguments to echo, they are shell operator and operand.  The shell
interprets them as instructions about program setup, and removes them
from the command line *before* executing echo.

Thus, the only way to specify the order in which streams are combined
is to use cat with its arguments (files names) in the desired order.

$ echo 789 | cat letters -
abc
789
$ echo 789 | cat - letters
789
abc
$ 

HTH

-- 
..  Now I think I just reached the state of HYPERTENSION that comes
 JUST BEFORE you see the TOTAL at the SAFEWAY CHECKOUT COUNTER!


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links