genomics: The power of vim

vim is a very powerful text editor having many great features that can prove to be a time saver if you are editing a large file.

I use the substitution operation in vim more often than not. One can a do a number of things with just very little practice.

1. Make a newline:

%s/$/\r/g # will make the end of the line into a new line character.

2. Replace annoying ^M characters that often happens from a DOS to unix transition

%s/ctrlVM/BY anything you like/ # Remember here ctrl-V makes a ^ and M makes an M

3. If you want to substitute something from line 20,30 here is how you can do

:20,30s/old/new/ -> remember here no '%' before 's'

4. If you want to change the order of two words from Hypothetical protein -> protein Hypothetical

:%s/$Hypothetical$ $protein$/\2 \1/ -> This will change all occurrences of Hypothetical protein to protein Hypothetical.

Some More Tricks in vim

Delete lines that contain pattern:
:g/pattern/d

Delete all empty lines:
:g/^$/d
Delete lines in range that contain pattern:
:20,30/pattern/d
or with marks a and b:
:'a,'b/pattern/d

Substitute all lines for first occurance of pattern:
:%s/pattern/new/
:1,$s/pattern/new/

Substitute all lines for pattern globally (more than once on the line):
:%s/pattern/new/g
:1,$s/pattern/new/g

Find all lines containing pattern and then append -new to the end of each line:
:%s/$.*pattern.*$/\1-new/g

Substitute range:
:20,30s/pattern/new/g
with marks a and b:
:'a,'bs/pattern/new/g

Swap two patterns on a line:
:s/$pattern1$$pattern2$/\2\1/

Capitalize the first lowercase character on a line:
:%s/$[a-z]$/\u\1/
more concisely:
:%s/[a-z]/\u&/

Capitalize all lowercase characters on a line:
:%s/$[a-z]$/\u\1/g
more concisely:
:s/[a-z]/\u&/g

Capitalize all characters on a line:
:s/$.*$/\U\1\E/

Example: In a sequence file, if you have multiple headers and you are particularly
interested in substituting line Hp_ENSC_10a23 -> Hp_ENSC_10A23 without touching any
other line this is for you..

:%s/ENSC_$.*$$[a-z]$/ENSC_\1\u\2/g

Regular Expressions:

Regular expressions in unix is more or less similar like perl - ya perl borrowed regexp from Unix..

But using substitution, one need to take care of adding '\' (esc) character in order to match a value.

For example if a line with super_2[TAB]5997239[TAB]5997668 needs to change to super_2:5997239-5997668, then instead of using '\S+' as the first non space patter use $\S\+$. So, in other words the syntax will be

:%s/^$\S\+$\t$\S\+$\t/\1:\2-/g

Capitalize the first character of all words on a line:
:s/\<[a-z]/\u&/g

Uncapitalize the first character of all words on a line:
:s/\<[A-Z]/\l&/g

Change case of character under cursor:
~

Change case of all characters on line:
g~~

Change case of remaining word from cursor:
g~w

Increment the number under the cursor:

Decrement the number under the cursor:

redraw:

Turn on line numbering:
:set nu
Turn it off:
:set nonu

Number lines (filter the file through a unix command and replace with output):
:%!cat -n

Sort lines:
:%!sort

Sort and uniq:
:%!sort -u

Read output of command into buffer:
:r !ls -l

Refresh file from version on disk:
:e!

Open a new window:

n

Open a new window with the same file (split):

s

Split window vertically:

v

Close current window:

c
:q

Make current window the only window:

o

Cycle to next window:

w

Move to window below current window:

j

Move to window above current window:

k

Move to window left of current window:

h

Move to window right of current window:

l

Set textwidth for automatic line-wrapping as you type:
:set textwidth=80

Turn on syntax highlighting
:syn on
Turn it off:
:syn off

Force the filetype for syntax highlighting:
:set filetype=python
:set filetype=c
:set filetype=php

Use lighter coloring scheme for a dark background:
:set background=dark

Htmlize a file using the current syntax highlighting:
:so $VIMRUNTIME/syntax/2html.vim

Or, htmlize from a command prompt:
in 2html.sh put:

#!/bin/sh
vim -n -c ':so $VIMRUNTIME/syntax/2html.vim' -c ':wqa' $1 > /dev/null 2> /dev/null

Now just run: shell> 2html.sh foo.py

Getting rid of tab while copy pasting from a vim
edited file:

By default when you try to copy paste a block from a file edited with vim, it automatically appends character in the beginning of each line. In order to get rid of that in your .vimrc file put a line saying "set paste". This will do the job.

File Recovery:

Many times while editing a file the connection to the server may be lost. This is what I face all the time. In that case you can recover all the changes you made earlier by saying vim -r .file_name.swp

Executing Shell Commands

You can execute a command with the shell, for example, you can compile you C code when editing it, :!gcc kernel.c, or :!gcc % to compile the current editing file. Executing other commands is the same.

Keyword Matching

If you are lazy as me, you will find this function wonderful :) In insert mode, you can type a few characters of a word, e.g., if there is a variable called phosphatidylinositolphosphate, you may just type pho then C-p. Vim will find out the last word in the file started with characters pho, if it is not you want, you can re-type C-p to match the further previous ones. Similarly, C-n can do that for finding the next matches. Therefore you do not worry about the mistyping of the long variables, or uncommon-used words.

Insert/Delete an indent

C-T/C-D insert/delete an indent in the current line, no matter what column the cursor is in. There are handy hot keys when you are editing language with ambiguous block boundary such as Python, which you have to delete the indent yourself when necessary.

Settings for .vimrc

Some useful setting and mapping for .vimrc are listed as follows:

set nocp " nocompatible
set sw=4 " shiftwidth
set et " expandtab
set wm=8 " wrapmargin
set bs=2 " backspace
set ru " ruler
set ic " ignorecase
set is " incsearch
set scs " smartcase: override the 'ic' when searching
" if search pattern contains uppercase char
set vb t_vb= " set visual bell and disable screen flash
set backup

NOTE

Sometimes find replace and delete lines with a particular pattern can very quickly done using sed. Yes, vim takes longer time to complete a request if the file is large. So, instead try sed.

sed '/^--/d' lane7.1 > tmp # Will delete all lines with pattern --
# from file lane7.1 and will print in tmp
# -- comes as a pattern delimiter from 'grep'

sed 's/^@/>/g' lane7.1 > tmp # Will replace all lines beginning with @ to >
# This is useful in converting fastq files to fasta

1 comment:

Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkata said...: To copy a word using vim will be; yw(if any punctuation is used as the word separator) or yW(if space is used as sole word separator followed by 'p' at the destination.
Example: ps-123_675 can be copied by yW/p; 12:14 PM

Friday, January 16, 2009

The power of vim

1 comment: