Friday, January 09, 2009

RepeatModeler Woes...

Continued from my last post..

Folks, if you are tempted to run a de novo repeat finder such as RepeatModeler, then this is for you.

I happened to get copies of all the dependencies along with RepeatModeler and got them installed in my linux machine. Here is what is not documented in any installation manual.

Installation Caveats:

1. If you installed RepeatMasker without getting a copy of RepBase, it will not complain and installation will be fine. But when you complete installation of RepeatModeler(If you could), then it will complain big time - "you did not install RepBase on repeatMasker", so go back and re-configure RepeatMasker all over again.

SO, LESSON NO 1 ALWAYS GET REPBASE BEFORE INSTALLING REPEATMASKER

2. TRFinder is named as trfLinux.. when you obtain it from the download site. The installation program looks for just "trf" program in the folder. So, RENAME the executable to trf before installation.

3. The biggest trouble comes with installation of RepeatModeler. It will run smooth till it asks you the path for wublast installation. WUBLAST is currently named as ABBLast and is no longer available. You only get the 1998 version from the web site. If you are lucky and you have a copy of wublast, then the installation complains about xdformat.
ERROR:
/home/sutripa/wublast/xdformat is too old. Must have one dated 3/16/2005 or new
er. Install a newer version of wublast and re-run configure.

In order to fix this, you can do little bit of hacking:
Go to the "configure" file in repeatmodeler package and comment the block
#unless ( $result =~ /option requires an argument -- m/ ) {
# die "$wuLocation/xdformat is too old. Must have one dated 3/16
/2005 "
# . "or newer. Install a newer version of wublast and re-run
"
# . "configure.\n";
#}

(This is located between line number 247 - 255)

4. While giving the path for RECON, say $RECON_PATH/bin. Otherwise configuration script will not find the executables.

Once all these are taken care of installation of repeatmodeler can be done.

RUNNING REPEATMODELER:

Again this is not any lesser challenge than installation.

Goto RepeatModeler file and on line 234 do the following:
Replace line
#my $dbIndex = `$RepModelConfig::XDFORMAT_PRGM -m2 -n -r $genomeDB`;
with
my $dbIndex = `$RepModelConfig::XDFORMAT_PRGM -n -r $genomeDB`;

AND ALSO Line number 500
Replace line
#my $sampleIndex = `$RepModelConfig::XDFORMAT_PRGM -m2 -n -r $sampleFa
staFile`;

with
my $sampleIndex = `$RepModelConfig::XDFORMAT_PRGM -n -r $sampleFastaF
ile`;

Use the README file for creating a database using the xdformat program from wublast.

/BuildXDFDatabase -name elephant elephant.fa

IMPORTANT:
[Before Running this, check your fasta file. Make the header of fasta as >string[SPACE]number format, where the number could ideally be the length of the sequence. Leave nothing else trailing on the header.]

If you fail to do this, while running wublast the program is going to complain
either of these Two. Either sequence not found OR Database is empty.

After doing all this, you may like to run the program as

RepeatModeler -database Ha.test >& Ha.out

Alas!!You thought the problem is finally solved and go check the run file. It still has error like:

FastaDB:_cleanIndexAndCompact - ERROR: Multiple fasta seqs appear on one line (>
seq-3_1-40000) and possibly more! Ignoring all entries on this line.
FastaDB:_cleanIndexAndCompact - ERROR: Multiple fasta seqs appear on one line (>
seq-1_1280001-1320000) and possibly more! Ignoring all entries on this line.
FastaDB:_cleanIndexAndCompact - ERROR: Multiple fasta seqs appear on one line (>
seq-1_520001-560000) and possibly more! Ignoring all entries on this line.

This is beacuse there is a silly bug in line 1590.
Go check one of the directories RM_12849.TueJan131331092009/round-1

you will see the header is printed like this: >>seq-3_1-40000

Now go to line 1590 in RepeatModeler and Replace

#print FASTA ">$seqID" . "_$start-$end\n";

With

print FASTA "$seqID" . "_$start-$end\n";


Now you can run the program.
Even after doing all these, I still have no luck in running this script. The run file ended up giving me a "unix broken pipe" error which is way beyond my control to fix.
Argh...

No comments: