How to write an introduction for a report
Table of Contents
Note - You can click on the table of contents sections to jump to that section.
Copyright 1994, 1995 Bruce Barnett and General Electric Company
Copyright 2001,2005,2007,2011,2013 Bruce Barnett
All rights reserved
You are allowed to print copies of this tutorial for your personal use, and link to this page, but you are not allowed to make electronic copies, or redistribute this tutorial in any form without permission.
Original version written in 1994 and published in the Sun Observer
Introduction to Sed
How to use sed, a special editor for modifying files automatically. If you want to write a program to make changes in a file, sed is the tool to use.
There are a few programs that are the real workhorse in the UNIX toolbox. These programs are simple to use for simple applications, yet have a rich set of commands for performing complex actions. Don't let the complex potential of a program keep you from making use of the simpler aspects. I'll start with the simple concepts and introduce the advanced topics later on.
When I first wrote this (in 1994), most versions of sed did not allow you to place comments inside the script. Lines starting with the '#' characters are comments. Newer versions of sed may support comments at the end of the line as well.
One way to think of this is that the old, "classic" version was the basis of GNU, FreeBSD and Solaris verisons of sed. And to help you understand what I had to work with, here is the sed(1) manual page from Sun/Oracle.
The Awful Truth about sed
Sed is the ultimate s tream ed itor. If that sounds strange, picture a stream flowing through a pipe. Okay, you can't see a stream if it's inside a pipe. That's what I get for attempting a flowing analogy. You want literature, read James Joyce.
Anyhow, sed is a marvelous utility. Unfortunately, most people never learn its real power. The language is very simple, but the documentation is terrible. The Solaris on-line manual pages for sed are five pages long, and two of those pages describe the 34 different errors you can get. A program that spends as much space documenting the errors as it does documenting the language has a serious learning curve.
Do not fret! It is not your fault you don't understand sed. I will cover sed completely. But I will describe the features in the order that I learned them. I didn't learn everything at once. You don't need to either.
The essential command: s for substitution
Sed has several commands, but most people only learn the substitute command: s. The substitute command changes all occurrences of the regular expression into a new value. A simple example is changing "day" in the "old" file to "night" in the "new" file:
Or another way (for UNIX beginners),
and for those who want to test this:
This will output "night".
I didn't put quotes around the argument because this example didn't need them. If you read my earlier tutorial on quotes. you would understand why it doesn't need quotes. However, I recommend you do use quotes. If you have meta-characters in the command, quotes are necessary. And if you aren't sure, it's a good habit, and I will henceforth quote future examples to emphasize the "best practice." Using the strong (single quote) character, that would be:
I must emphasize that the sed editor changes exactly what you tell it to. So if you executed
This would output the word "Sunnight" because sed found the string "day" in the input.
Another important concept is that sed is line oriented. Suppose you have the input file:
and you used the command
The output would be
Note that this changed "one" to "ONE" once on each line. The first line had "one" twice, but only the first occurrence was changed. That is the default behavior. If you want something different, you will have to use some of the options that are available. I'll explain them later.
So let's continue.
There are four parts to this substitute command:
The search pattern is on the left hand side and the replacement string is on the right hand side.
We've covered quoting and regular expressions.. That's 90% of the effort needed to learn the substitute command. To put it another way, you already know how to handle 90% of the most frequent uses of sed. There are a. few fine points that any future sed expert should know about. (You just finished section 1. There are only 63 more sections to cover. -) Oh. And you may want to bookmark this page. just in case you don't finish.
The slash as a delimiter
The character after the s is the delimiter. It is conventionally a slash, because this is what ed. more. and vi use. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - you could use the backslash to quote the slash:
Gulp. Some call this a 'Picket Fence' and it's ugly. It is easier to read if you use an underline instead of a slash as a delimiter:
Some people use colons:
Others use the "|" character.
Pick one you like. As long as it's not in the string you are looking for, anything goes. And remember that you need three delimiters. If you get a "Unterminated `s' command" it's because you are missing one of them.
Using & as the matched string
Sometimes you want to search for a pattern and add some characters, like parenthesis, around or near the pattern you found. It is easy to do this if you are looking for a particular string:
This won't work if you don't know exactly what you will find. How can you put the string you found in the replacement string if you don't know what it is?
The solution requires the special character "&." It corresponds to the pattern found.
You can have any number of "&" in the replacement string. You could also double a pattern, e.g. the first number of a line:
Let me slightly amend this example. Sed will match the first string, and make it as greedy as possible. I'll cover that later. If you don't want it to be so greedy (i.e. limit the matching), you need to put restrictions on the match.
The first match for '[0-9]*' is the first character on the line, as this matches zero or more numbers. So if the input was "abc 123" the output would be unchanged (well, except for a space before the letters). A better way to duplicate the number is to make sure it matches a number:
The string "abc" is unchanged, because it was not matched by the regular expression. If you wanted to eliminate "abc" from the output, you must expand the regular expression to match the rest of the line and explicitly exclude part of the expression using "(", ")" and "\1", which is the next topic.
Extended Regular Expressions
Let me add a quick comment here because there is another way to write the above script. "[0-9]*" matches zero or more numbers. "[0-9][0-9]*" matches one or more numbers. Another way to do this is to use the "+" meta-character and use the pattern "[0-9]+" as the "+" is a special character when using "extended regular expressions." Extended regular expressions have more power, but sed scripts that treated "+" as a normal character would break. Therefore you must explicitly enable this extension with a command line option.
GNU sed turns this feature on if you use the "-r" command line option. So the above could also be written using
Mac OS X and FreeBSD uses -E instead of -r. For more information on extended regular expressions, see Regular Expressions and the description of the -r command line argument
Using \1 to keep part of the pattern
I have already described the use of "(" ")" and "1" in my tutorial on regular expressions. To review, the escaped parentheses (that is, parentheses with backslashes before them) remember a substring of the characters matched by the regular expression. You can use this to exclude part of the characters matched by the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. Sed has up to nine remembered patterns.
If you wanted to keep the first word of a line, and delete the rest of the line, mark the important part with the parenthesis:
I should elaborate on this. Regular expressions are greedy, and try to match as much as possible. "[a-z]*" matches zero or more lower case letters, and tries to match as many characters as possible. The ".*" matches zero or more characters after the first match. Since the first one grabs all of the contiguous lower case letters, the second matches anything else. Therefore if you type
This will output "abcd" and delete the numbers.
If you want to switch two words around, you can remember two patterns and change the order around:
Note the space between the two remembered patterns. This is used to make sure two words are found. However, this will do nothing if a single word is found, or any lines with no letters. You may want to insist that words have at least one letter by using
or by using extended regular expressions (note that '(' and ')' no longer need to have a backslash):
The "\1" doesn't have to be in the replacement string (in the right hand side). It can be in the pattern you are searching for (in the left hand side). If you want to eliminate duplicated words, you can try:
If you want to detect duplicated words, you can use
or with extended regular expressions
This, when used as a filter, will print lines with duplicated words.
The numeric value can have up to nine values: "\1" thru "\9." If you wanted to reverse the first three characters on a line, you can use
Sed Pattern Flags
You can add additional flags after the last delimiter. You might have
noticed I used a 'p' at the end of the previous substitute command. I also added the '-n' option. Let me first cover the 'p' and other pattern flags. These flags can specify what happens when a match is found. Let me describe them.
/g - Global replacement
Most UNIX utilities work on files, reading a line at a time. Sed. by default, is the same way. If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first. For an example, let's place parentheses around words on a line. Instead of using a pattern like "[A-Za-z]*" which won't match words like "won't," we will use a pattern, "[^ ]*," that matches everything except a space. Well, this will also match anything because "*" means zero or more. The current version of Solaris's sed (as I wrote this) can get unhappy with patterns like this, and generate errors like "Output line too long" or even run forever. I consider this a bug, and have reported this to Sun. As a work-around, you must avoid matching the null string when using the "g" flag to sed. A work-around example is: "[^ ][^ ]*." The following will put parenthesis around the first word:
If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:
Is sed recursive?
Sed only operates on patterns found in the in-coming data. That is, the input line is read, and when a pattern is matched, the modified output is generated, and the rest of the input line is scanned. The "s" command will not scan the newly created output. That is, you don't have to worry about expressions like:
This will not cause an infinite loop. If a second "s" command is executed, it could modify the results of a previous command. I will show you how to execute multiple commands later.
/1, /2, etc. Specifying which occurrence
With no flags, the first matched substitution is changed. With the "g" option, all matches are changed. If you want to modify a particular pattern that is not the first one on the line, you could use "\(" and "\)" to mark each pattern, and use "\1" to put the first pattern back unchanged. This next example keeps the first word on the line but deletes the second:
Yuck. There is an easier way to do this. You can add a number after the substitution command to indicate you only want to match that particular pattern. Example:
You can combine a number with the g (global) flag. For instance, if you want to leave the first word alone, but change the second, third, etc. to be DELETED instead, use /2g:
I've heard that combining the number with the g command does not work on The MacOS, and perhaps the FreeSBD version of sed as well.
Don't get /2 and \2 confused. The /2 is used at the end. \2 is used in inside the replacement field.
Note the space after the "*" character. Without the space, sed will run a long, long time. (Note: this bug is probably fixed by now.) This is because the number flag and the "g" flag have the same bug. You should also be able to use the pattern
but this also eats CPU. If this works on your computer, and it does on some UNIX systems, you could remove the encrypted password from the password file:
But this didn't work for me the time I wrote this. Using "[^:][^:]*" as a work-around doesn't help because it won't match an non-existent password, and instead delete the third field, which is the user ID! Instead you have to use the ugly parenthesis:
You could also add a character to the first pattern so that it no longer matches the null pattern:
The number flag is not restricted to a single digit. It can be any number from 1 to 512. If you wanted to add a colon after the 80th character in each line, you could type:
You can also do it the hard way by using 80 dots:
/p - print
By default, sed prints every line. If it makes a substitution, the new text is printed instead of the old one. If you use an optional argument to sed, "sed -n," it will not, by default, print any new lines. I'll cover this and other options later. When the "-n" option is used, the "p" flag will cause the modified line to be printed. Here is one way to duplicate the function of grep with sed.
But a simpler version is described later
Write to a file with /w filename
There is one more flag that can follow the third delimiter. With it, you can specify a file that will receive the modified data. An example is the following, which will write all lines that start with an even number, followed by a space, to the file even.
In this example, the output file isn't needed, as the input was not modified. You must have exactly one space between the w and the filename. You can also have ten files open with one instance of sed. This allows you to split up a stream of data into separate files. Using the previous example combined with multiple substitution commands described later, you could split a file into ten pieces depending on the last digit of the first number. You could also use this method to log error or debugging information to a special file.
/I - Ignore Case
GNU has added another pattern flags - /I
This flag makes the pattern match case insensitive. This will match abc, aBc, ABC, AbC, etc.
Note that a space after the '/I' and the 'p' (print) command emphasizes that the 'p' is not a modifier of the pattern matching process. but a command to execute after the pattern matching.
Combining substitution flags
You can combine flags when it makes sense. Please note that the "w" has to be the last flag. For example the following command works:
Next I will discuss the options to sed. and different ways to invoke sed.
Arguments and invocation of sed
previously, I have only used one substitute command. If you need to make two changes, and you didn't want to read the manual, you could pipe together multiple sed commands:
This used two processes instead of one. A sed guru never uses two processes when one can do.
Multiple commands with -e command
One method of combining multiple commands is to use a -e before each command:
A "-e" isn't needed in the earlier examples because sed knows that there must always be one command. If you give sed one argument, it must be a command, and sed will edit the data read from standard input.
The long argument version is
Filenames on the command line
You can specify files on the command line if you wish. If there is more than one argument to sed that does not start with an option, it must be a filename. This next example will count the number of lines in three files that don't begin with a "#:"
Let's break this down into pieces. The sed substitute command changes every line that starts with a "#" into a blank line. Grep was used to filter out (delete) empty lines. Wc counts the number of lines left. Sed has more commands that make grep unnecessary. And grep -c can replace wc -l. I'll discuss how you can duplicate some of grep 's functionality later.
Of course you could write the last example using the "-e" option:
There are two other options to sed.
sed -n: no printing
The "-n" option will not print anything unless an explicit request to print is found. I mentioned the "/p" flag to the substitute command as one way to turn printing back on. Let me clarify this. The command
acts like the cat program if PATTERN is not in the file: e.g. nothing is changed. If PATTERN is in the file, then each line that has this is printed twice. Add the "-n" option and the example acts like grep:
Nothing is printed, except those lines with PATTERN included.
The long argument of the -n command is either
Using 'sed /pattern/'
Sed has the ability to specify which lines are to be examined and/or modified, by specifying addresses before the command. I will just describe the simplest version for now - the /PATTERN/ address. When used, only lines that match the pattern are given the command after the address. Briefly, when used with the /p flag, matching lines are printed twice:
And of course PATTERN is any regular expression.
Please note that if you do not include a command, such as the "p" for print, you will get an error. When I type
I get the error
Also, you don't need to, but I recommend that you place a space after the pattern and the command. This will help you distinquish between flags that modify the pattern matching, and commands to execute after the pattern is matched. Therefore I recommend this style:
Using 'sed -n /pattern/p' to duplicate the function of grep
If you want to duplicate the functionality of grep, combine the -n (noprint) option with the /p print flag:
sed -f scriptname
If you have a large number of sed commands, you can put them into a file and use
where sedscript could look like this:
When there are several commands in one file, each command must be on a separate line.
The long argument version is
sed in shell scripts
If you have many commands and they won't fit neatly on one line, you can break up the line using a backslash:
Quoting multiple sed lines in the C shell
You can have a large, multi-line sed script in the C shell, but you must tell the C shell that the quote is continued across several lines. This is done by placing a backslash at the end of each line:Source: www.grymoire.com