Friday, January 09, 2009

RepeatMasker blues..

I read rave reviews on RepeatMasker for repeat finding, so decided to get hold of a copy and get it used. I used this software before and did not like it much, but after getting some leading genome research institutes using it for their newly sequenced genomes, I got tempted. I used to use TRFinder from Boston university for finding direct repeats earlier.

I went to the web page of RepeatMasker that directs me to an array of other dependencies to be installed first. RepeatMasker depends on a repeat library RepBase which can be obtained from http://www.girinst.org. You go there and have to get yourself registered. Your registration need to be accepted by the group and then they send you an email carrying your login id and password. This may take upto 2 working days. Only after this you can download the RepBase database.

However, the trouble with RepBase is, it is a predefined library and will not detect any new repeat that may be there in your genome. So, the best option is to run a "De Novo" repeat finder. From the same web site you can get a link to RepeatModeler that acts as a wrapper around some de novo repeat finders such as RepeatScout and RECON.

Fair enough, then you go to install RepeatModeler...

This web page indicates you have to have a number of dependencies:

-- Of course you need to have perl 5.8 or above installed first.
-- you will need the Trfinder that I was mentioning earlier.
-- then you will need to download and install repeatmasker. This will not install unless you have RepBase in place.
--For RepeatMasker, you will also need phred/phrap/cross_match package. This can be done by writing to the authors and getting a copy by email. This may take a few business days.
--After you are done with repeatmasker you will need RECON to be installed
--Then RepeatScout need to be downloaded and installed.
--You will need WUBLAST also to be installed. The recommended site from the repeatmodeler page points to WUBLAST site at http://blast.wustl.edu/. When you go there you get redirected to http://www.advbiocomp.com . In advbiocomp site, they say that wublast is no longer supported and you can get an older version of WUblast for free. Then you get the binaries, inflate and keep them in place.

Now the final part: Installation of RepeatModeler. You do a ./configure. It asks you perl path, repeatMasker path etc. When it comes to WuBlast path, it exits saying that you xdformat program is older than 2005. What the hell, you back and check in wublast site, that says the program is as old as 1998. There are no newer version. There you get stuck.

Out of frustartion, I went and looked at the configure file in RepeatModeler. In line 237 there is a statement:

else {
my $result = `$wuLocation/xdformat -m 2>&1`;

# Two responses are expected:
# Good Version:
# /blah/wublast/xdformat: option requires an argument -- m
# Old Version:
# /blah/wublast/xdformat: invalid option -- m
unless ( $result =~ /option requires an argument -- m/ ) {
die "$wuLocation/xdformat is too old. Must have one dated 3/16/
5 "
. "or newer. Install a newer version of wublast and re-run

. "configure.\n";
}

I went back and asked the authors of xdformat, what this -m option means, they said to ignore it. Now I can change this, if only I find what is the substitute of this. Because my search in repeatModeler directory says me -m2 option is used:

gus@tyler-lab:~/RepeatModeler$ grep --regex=-m -r .
./configure: my $result = `$wuLocation/xdformat -m 2>&1`;
./Refiner: $malign->serializeOUT( "$wrkDir/$consName-malign-$round.ser"
)
./RepeatClassifier:# wublastx the simple-masked consensi vs the transposable ele
ment
./RepeatModeler:my $dbIndex = `$RepModelConfig::XDFORMAT_PRGM -m2 -n -r $genomeD
B`;
./RepeatModeler: my $sampleIndex = `$RepModelConfig::XDFORMAT_PRGM -m2 -n -r $s
ampleFastaFile`;
./RepeatModeler: # . " -mincount 3 -tandemdist 500 -output $fastaFile.lfreq
"
./RepeatModeler:"$workDir/sampleDB-$round.fa $workDir/consensi.fa -warnings -kap
-wordmask=dust wordmask=seg maskextra=10 -hspmax=20 V=0 B=250 Q=25 R=5 W=7 S=25
0 gapS2=250 S2=125 X=250 gapX=500 -matrix=comparison.matrix"

I am still waiting for a response...

1 comment:

princepersia said...
This comment has been removed by the author.