vim tutorial – using the s command to replace text on the fly

The vim text editor comes with a powerful :s (substitution) command that is more versatile and expressive than the search and replace functionality found in GUI based text editors.

The general form of the command is:

The address specifies which lines vim will search. If none is provided, it will default to the current line only. You can enter in a single line number to search, or specify an inclusive range by entering in the lower and upper bounds separated by a comma. For example: an address of 1,10 is lines one through ten inclusive.

You can also provide a string value for the address by enclosing it with forward slashes. vim will operate on the next line that matches this string. If the address string is preceded by “g”, vim will search all lines that match this string. /hello/ matches the next line that contains hello, whereas g/hello matches every line that contains hello.

The search-string is a regular expression and the replace-string can reference the matched string by using an ampersand (&).

[option] allows even more fine grained control over the substitution. One of the more common options used is “g”, not to be confused with the “g” that precedes address. Option “g”, which appears at the end of the command, replaces every occurrence of the search-string on the line. Normally, the substitute command only matches on the first occurrence and then stops.


run on the following line:
ten + ten = 20

results in:

10 + 10 = 20

as opposed to:

10 + ten = 20

without the global option.

Given all this versatility, the :s command comes in quite handy. Consider the following scenario. There is a comma delimited file that is missing trailing commas on some lines and not others. In order to normalize the text file so that all lines ended with a comma, you could run:


The address range 1,$ spans the entire file ($ in the address means the last line in the file). The search-string “[^,]$” is a regular expression that matches every line that ends with any character except comma ($ in a regex indicates end of the line). The replace-string has an &, which refers to the trailing character matched in the search-string. By setting the replace-string to “&,” we are telling VIM to take the last character on every line that is not a comma and add a comma to it.

[^,]$ won’t match on blank new lines because [^,] expects at least one character to be on the line. To get around this problem, you would normally use negative look behinds, however the VIM regex does not seem to support them. The easiest way around this is to use a second replace command for newlines:

This tells it to only add a comma to any line that only contains a newline (^ in a regex indicates start of line).

This is just one example of course. By coming up with the right regex in the search-string, you can automate all sorts of normally tedious tasks with succinct commands. The best part is, unlike those cumbersome GUI based editors that often require the use of a pesky mouse, your hands never have to leave the keyboard! For even more control and flexibility, you could use sed, but :s can handle most day to day tasks quite easily.

Regex lookahead and lookbehind

A common search scenario involves finding all occurrences of a string x, but that are not followed by string y. Here’s a contrived example. Let’s say you were fond of using the variables foo, bar, and foobar. They appear everywhere in the code. Now you want to search for all occurences of the variable “foo”. Unfortunately, doing a simple search will result in foobar being returned in the search results as well. So you could attempt to do a search using grep using this as your regex: “foo[^b][^a][^r]”

Now, let’s say the test.txt file consists of the following 3 lines:

hello foo world

Running “grep foo[^b][^a][^r] test.txt” only returns the third line, and not the first. The reason is that the regex will match all lines containing “foo” not followed by the three characters “b”, “a”, and “r”. What we actually want is to match on “foo” not followed by “bar”. There is a subtle semantic difference here. The former is expecting the existence of three characters to follow “foo”. If those three characters do not exist, then the match fails. We can express the latter using a negative lookahead: grep -P “foo(?!bar)”. Note that the -P tells it to use PERL style regeular expressions, which actually support lookaheads and lookbehinds. This time, grep returns both line one and three.

A lookahead does exactly what it sounds like. The regex engine will look ahead of its current position for the specified pattern. A negative lookahead causes the match to fail immediately if the pattern is found, while a positive lookahead does not. Now, the key point here is that these constructs are what is known as zero width assertions.

A zero width assertion does not actually consume any characters when doing a match. What does this mean? Lets take the following regex as an example: “(\d+)(\w+)”. When matched against the input string “12345xyz”, the “(\d+)” part of the regex “consumes” the characters “12345”. These will be returned as part of the match group, and will not be available to be matched against anything else in the regex. A zero width assertion on the other hand, leaves the string intact, leaving those characters available for matching against the rest of the regex. Start of line “^” and end of line “$” are two examples of zero width assertions that most people are familiar with.

A lookahead is also a zero width assertion. For example, let’s take a look at the regex foo(?=bar)(\w+) applied on the string “foobarhelloworld”. The positive lookahead portion “foo(?=bar)” will succeed, because in the input string, “foo” is followed by “bar”. However, “bar” will not be consumed by this match, and will be consumed by “(\w+)”. So match group one will consist of “barhelloworld”.

Lookbehinds work in the same way. A lookbehind causes the regex engine to look behind in its current position for the specified pattern. A negative lookbehind will fail the match if the pattern is found, and a positive one will not. Let’s take a look at the regex (\w+)(?<=foo)bar(\d+), on the input string "Testfoobar12345". Again, since the lookbehind is a zero width assertion, the first "(\w+)" matches "Testfoo". The lookbehind, "(?<=foo)bar", succeeds, and the rest of the string, "12345" is matched against the "(\d+)". Here is a quick cliff notes summary. Positive lookahead: "foo(?=bar)" matches on foo followed by bar Negative lookahead: "foo(?!bar)" matches on foo not followed by bar Positive lookbehind: "(?<=foo)bar" matches on bar preceded by foo Negative lookbehind: "(?<!foo)bar" matches on bar not preceded by foo Needless to say, lookahead and lookbehind provide a concise way of specifying that a given pattern cannot follow or precede a given position in the string. Its a powerful feature, but unfortunately not all tools and languages with regex engines support it, especially older versions (hence calling grep with "-P" in one of the examples given above). However, anytime you can use them, they will invariably make pattern matching that much easier.