genomics

Tuesday, April 17, 2012

Making 300 dpi images from powerpoint slides

Publication quality images are usually high resolution images with 300 dpi density. For those using screen shots, copy pasting images on powerpoint slides and later saving them into JPEG or any other file format, usually get the image saved as a 96 dpi. I use gimp(GNU Image manipulating program) as an image editing software for my papers. Gimp has options to increase the resolution. You can go to file -> open -> <your image>

Once the image file opened, go to Image and select scaleImage. Change resolution with more pixel density and then say 'scale'. Finally save the image. Now you will notice, the image file is bigger with more pixel density, but it has not made the image any clearer. When you open the file, you may see unwanted black dots here and there, making the image looks more messy. There is a very cool way to this by editing the windows registry. NOTE: before you start doing this, make sure, you have windows service pack-2 installed.

For that open a command line option in windows something like this:

Then from the left menu, click on HKEY_CURRENT_USER -> Software -> microsoft -> office -> 12.0 -> powerpoint -> options . At this stage there will be some binary values already on the right pane. Then go to Edit -> new -> DWORD (32 bit value) and type in any name such as ExportBitMapResolution. Once created, this will appear on your right hand side binary lists. Then click on the ExportBitMatResolution and then set the value as 300 and select decimal. Then come out of it.

Now whenever you try to save an image from a powerpoint slide, it will by default save it as 300 dpi
as against 96 dpi.

Tuesday, April 10, 2012

Carl Zimmer at the 2012 DOE JGI Genomics of Energy & Environment Meeting

Wednesday, March 14, 2012

Pipelines VS Semi-manual workflow

After working in both the systems for quite sometimes, I am finally here to compare both the processes into little bit more in detail.

Development Time and Error Detection:
Pipelines have usually long development incubation period and the final outcome is often not bug free. Something breaks somewhere it takes the entire crew to sit and break their heads for sometime before a solution emerges.
Semi-manual workflow system on the other hand is a compositions of scripts that are not connected with each other. Steps are defined, but the scripts need to be run one after the other(Can be bundled up in a shell though) under somebody's supervision till it completes. System development does not take a fortune with one or two dedicated developers. Bugs may be there but easy to fix since one knows exactly where it comes from.

Making Changes:

Pipelines: Making any kind of changes to a pipeline is not as easy as may be thought. It takes a lot of time and effort to add additional features to the existing ones because it needs a whole lot of development both upstream and downstream.
Semi-manual workflow system: Making changes is less cumbersome since each individual step occur as a separate unit and quick fixes can be inserted anywhere without making a life altering change and hence relatively easier.

Working with Dataset classes:
Pipelines: Pipelines often follow strict nomenclature and hierarchy distribution. Parameter passing is not automatic, is passed through classes or some form of XML files. These are hand edited. Everytime a pipeline fails, you need to re-run the whole process.
Semi-manual workflow system: Parameters are hand written set for each of the steps, little bit cumbersome but if documented well, one can do it effortlessly. The manual editing time for configuration files for a pipeline and semi-automated process is more or less same.

Having said that, I will give an example. I work on a database schema called as GUS. It is a huge schema system and requires quite a learning curve to understand and work on it. We have already passed that phase and I have created loose semi-automated steps for genome data upload. At the same time, I am working with another complex system based on slightly modified version of GUS, but has a pipeline built into it. After working incessantly for 5 months, we could only upload one genome with same amount of depth as compared to 4 genomes in a span of 15 days using semi-automated system. Given a choice, I will always bet for semi-automated system.

Monday, February 06, 2012

Free SVN book!!!!!!

A free svn book is here: http://svnbook.red-bean.com/en/1.7/index.html

Thursday, January 05, 2012

Awesomest video!!

Watch RNAi on PBS. See more from NOVA scienceNOW.

Wednesday, December 28, 2011

Calling a perl subroutine from PHP scripts

I spent a good part of my Christmas vacation in figuring out how to call a perl subroutine from a PHP script. There are several reasons why you would like to do that. The first and foremost may be because you don't want to replicate all your perl subroutines to PHP in order to use it. The other issues may be incompatibilities. The one I face is on incompatibility of my PHP version to run oracle queries which can only be solved at the sys admin level.On the other hand the perl/CGI interface for oracle just works fine.

There are 3 levels this task can be achieved:
1. We will see how to pass absolute values to perl subroutines.
2. Pass variables to perl subroutines and
3. Collect return values from the perl subroutine

Following are some points to be remembered:

1. If your perl subroutines are packed into perl packages then they are good to go (e.g; The file should begin with a package "name"; header and the end of the file should have a 1; )
2. Do not use <include "package name"> inside the PHP script.
3. Initialize a string with perl commands e.g; $command='perl -MpackageName -e "packageName::subroutine(arg,arg,arg)"'
4. Call system($command); from the php script. Do not use backticks (`)

Here is an example of passing absolute value to perl subroutine:

##Package Test ###
#!usr/bin/perl -w
package Test;

sub printNames
{
my $name1 = shift;
my $name2 = shift;

print "The names are $name1 and $name2\n";

}

1;

## save it as Test.pm

Level1: Passing an absolute value into the perl subroutine:

# test.php
<?
$command='perl -MTest -e "Test::printNames(Guest1,Guest2)"';
system($command);
?>
#Open browser and run test.php :

The names are Guest1 and Guest2

Level2: Passing a PHP variable into the perl subroutine:

<?

$arg ="guest1 and guest2";
$arg1 ="guest3 and guest4";

$command = "perl -MTest -e 'Test::printNames("$arg","$arg1")'";
system($command);

?>

# Open browser and run the command:

The names are guest1 and guest2 and guest3 and guest4

Level3: Collecting the return values from perl subroutine as PHP array

Instead of running PHP "system" command, run "exec". Print the outputs from inside the perl subroutine, that can be captured by exec. Now the perl subroutine will undergo slight modification:

##Package Test ###
#!usr/bin/perl -w
package Test;

sub calculateVal
{
my $val1 = shift;
my $val2 = shift;

$val1 *= 20;
$val2 /= 3;

print $val1;
print $val2;

}

1;

---

<?

$arg =10;
$arg1 =300;

$command = "perl -MTest -e 'Test::printVal($arg,$arg1)'";
$out = array();
$tmp = exec($command,$out);
print_r($out);
?>

# Output

Array
(
[0] => 200
[1] => 100
)

NOTE: Multidimensional perl arrays can also be passed by printing the value from inside the perl subroutine.

Thursday, October 20, 2011

Installing SQL Developer

This is slightly offtopic for this blog, but nevertheless a very important one. I have decided to blog this one becuase this particular problem dragged for atleast few days for me and after surfing countless number of sites and installing various softwares including harmful ones, I learnt the hard way how to tackle it.
If you have a 64 bit windows7 machine and you are trying your luck installing SQL Developer and failing consistently, this post is for you.
SQL developer is available at the oracle site here http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html. However, this version does not come with a compatible java version. You may already have java installed in your machine, if not, you may need to follow the instructions from the above mentioned site on how to install java.
After you downloaded and unpacked SQL Developer, when you click on the sqldeveloper.exe file, it may say "permission denied". This may be because the executable file may not have execution permission. You may like to change that with a chmod 755 <filename> . Then click on the executable file sqldeveloper.exe . It may possibly ask you the java path if it does not find one, just provide the path through the browse button option. I was stuck when the program complained about absence of msvcr100.dll and jvm.dll files . I browsed countless number of sites and in that process installed a melaware and ended up cleaning it later. Be aware, don't download .dll files from anywhere other than safe places. I looked for resources that asked me to download .NET and visual C++ that could solve the lingering msvcr100.dll problem but in vein. I have installed uninstalled .NET and visual C++ few times at least to make sure that the softwares are installed correctly, but that did not work. Since it was looking for this file from Java, I checked the java distribution and finally located it under ProgramsFiles/Java/jdk.7.0_01/bin/ . Copy pasted it under Windows/system32/ . For jvm.dll, many sites including the java site advised that probably the java installation was incorrect. I re-installed java several times after each un-install and it still did not work. Finally I found a safe site at https://rt4.cceb.med.upenn.edu/crcu_html/jinit/jinit_download.htm , from where I downloaded jvm.dll. Copy pasted this file under Windows/system32 and it solved the problem!!

Monday, October 17, 2011

RGS14 - The protein that makes us forget

In this months protein spotlight issue, there is a protein, RGS14 highlighted that makes us filter our memory. Do we not need something that will make us remember things rather forget them? Well, too much un-necessary information stored in brain will certainly make it more chaotic. However, silencing RGS14 in mice makes them smarter, I wonder if same can be said about humans.

The full article is available here http://web.expasy.org/spotlight/pdf/132/

Thursday, September 22, 2011

Morph of plant embryo development

Awesome video!

To or Not to with cufflink:

Cufflink is an amazingly easy to install and use software, that lured me into using it. However, it is not without its sets of pitfalls... I am still researching on the illusive nature of the outputs from this software.
this software and the types of outputs it produces using different commands:
Here are few things to keep in mind before trying to run cufflink
1. Cufflink can be run on sorted bam files/ sam files.
2. It can run in multi-threading mode with a -p option and is much faster than single threading mode.
3. The new cuffmerge program first converts your gtf files derived from cuffcompare to sam files, merges them, sorts them, runs cufflink on them with a set of hard coded parameters and then runs cuffcompare on the finally to give you your merged.gtf file.
[The cufflink commandline option: cufflinks -o ./merged_asm/ -F 0.05 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 16 ./merged_asm/tmp/mergeSam_filewvcXTG.]

Earlier, I used a route as follows:
1. Run cufflink on each of the libraries with a reference gtf and without a reference gtf.
2. Merge the outputs separately using cuffcompare on, with and without reference gtf and merge the output with a script to indicate which of the transcripts are represented by a gene id. But the potential pitfall with this approach is the overlapping and shorter transcripts. This clearly is a stumbling block when you are trying to produce an assembled transcript.

In the recent cufflink versions, there is a accessory program called as cuffmerge, which the manual suggests for merging the individual gtfs. As I have mentioned earlier, this is again a wrapper, that internally calls cufflink and cuffcompare, albeit with several options already pre-set. So, what I did was, merged the bam files generated from different libraries, merged them with samtools, sorted with samtools and derived sam files from the sorted bam for running cufflink [ Please note running the same sorted sam files with a reference gtf file suggests it is not sorted, where as without a gtf file runs fine..]

Output files
with reference gtf:
genes.fpkm.tracking -> has no fpkm information
transcripts.gtf -> has fpkm column but meaningless (all 0.0000)
isoforms.fpkm_tracking -> similar in size and content with genes.fpkm.tracking

Without reference gtf:
transcripts.gtf -> has transcript and exon information with fpkm and cov values; the co-ordinates are 1 based.
genes.fpkm.tracking -> has gene info with fpkm. The co-ordinates are 0 based
isoform.fpkm_tracking -> same as genes.fpkm.tracking

Comparison between all-merged-sam with cufflink VS cufflink -> gtf -> cuffcompare:

1. In both the cases, only a single exon is reported per gene(Since cufflink is run after bowtie directly without running tophat, this may be the case).
2. Much less number of transcripts are found in the first case, and they are non-overlapping and larger than the later case, where the transcripts are short, overlapping. The FPKM is slightly higher than the FPKM values in the later case. The first one seems to merge several smaller transcripts together.

So, in essence, if you have various biological replicates of a single treatment type, instead of going through the path of running them individually with cufflink, followed by merging the results with cuffmerge, merge the map bam files first and follow this route. The FPKM values are much accurate in this case...

Wednesday, September 14, 2011

Regulatory regions under represented with NGS methods

My new blog post on this subject can be found at SAB blogger web site at:
http://products.scienceboard.net/index.php/archives/2011/09/13/916/

Thursday, September 08, 2011

JavaScript: Changing drop down Lists

The select menu should work something like this. Change the first list and the second list also changed

Although this is a very small javascript trick, but nevertheless a very useful one! While creating forms, you would sometimes like to change a drop down list dynamically depending on which option was chosen in a earlier list(This may be a radio button, a list itself or anything else)
Here is a step by step procedure:

1. Write down the names and values for first drop down box e.g; Psojae V1-> name, psv1 -> value; Psojae V5-> name, psv5-> value and so on...
2. For each name in first drop down list, make a sublist of name value pairs, you want to appear on select: for example, for Psojae V1: PS1->name; ps1->value; PS2->name; ps2->value AND for Psojae V4: WI1->name; wi1->value; WI2->name; wi2->value.
3. Now write a javascript with all these primary lists and sublist name value pairs something like this:
<script language="javascript">
var lists = new Array();

//First List
lists['psv5']    = new Array(); // Notice here you are making a list with the value of first list
lists['psv5'][0] = new Array( // These are the names you want to appear on the second list
        'WI1',
        'WI2'
);
lists['psv5'][1] = new Array( //These are the values you want to pass on from the second list
        'wi1',
        'wi2'
        );
//Second List
lists['psv1']    = new Array(); // Notice here you are making a list with the value of first list
lists['psv1'][0] = new Array( // These are the names you want to appear on the second list
        'PS1',
        'PS2'
);
lists['psv1'][1] = new Array( //These are the values you want to pass on from the second list
        'ps1',
        'ps2'
        );

4. Write the second sets of javascripts having functions like:
emptyList, fillList and changeList

function emptyList( box ) {
        while ( box.options.length ) box.options[0] = null;
}
function fillList( box, arr ) {
        for ( i = 0; i < arr[0].length; i++ ) {
                option = new Option( arr[0][i], arr[1][i] );
                box.options[box.length] = option;
        }
        box.selectedIndex=0;
}
function changeList( box ) {
        list = lists[box.options[box.selectedIndex].value];
        emptyList( box.form.reads ); // Here notice I have given name 'reads' for the form object
        fillList( box.form.reads, list ); // that is the name tag on the select object in html for the second list
}
</script>

5. Now add the following to the body tag of your html page:

<body onload="changeList(document.forms['nextgen'].reference)"> // Here 'nextgen' is the name of the form. Notice, I add this trigger when the form gets loaded to execute changeList().

The page will be fine at this stage, only problem is you have to refresh it after making a select on the first select drop down. If you don't want that do this last thing:

6. Write the following on the first select tag:

<select name="reference" size=1 onchange="changeList(this)">

In case, you want to be able to select multiple values from a select list(by shift + ctrl), go ahead and add the following to your second select tab:

<select name="reads" size=N multiple width=M>

Enjoy with javascript!