Welcome to the GenomeQuest Documentation Wiki

Plugin My Application

From GQ Wiki
Jump to: navigation, search

This document is a How-to for administrators of GenomeQuest. It explains how administrators can better serve their user community by adding favorite applications to GenomeQuest. To use these instructions you will need administrator access to the GenomeQuest server. If you do NOT have this access, please contact your administrator with your request for your favorite application or check out the list of existing applications.

Users often have their favorite applications for sequence analysis. As a platform, GenomeQuest allows you to plug-in your favorite application to let your users proceed with their research with the least disruption.

Contents

Types of Plug-ins

  1. Server-side Applications. Some applications can be configured to run on the GenomeQuest server and GenomeQuest can be configured to show the results on the web browser.
    1. PepWindow: Consider that you support a GenomeQuest user who likes generating hydropathy plots using PepWindow. Since you can download this emboss tool to your GenomeQuest server, you can make this a server side application and allow all your users to access it.
    2. Coverage: When genomic sequence reads are mapped to a reference genome, an oft-asked question is whether the entire length of the genome was covered by the reads. Specifically, what are the places where coverage is inadequate. This could be a simple extension to any workflow that maps reads to reference sequences. Using GenomeQuest's Sequence Data Management platform abilities, this becomes an easy task.
    3. For simplicity we will show just the pepwindow application but the principles should be clear enough to show how you can plug in one of your own scripts.
  2. Client-side Applications. Consider that you would like to support users with applications like Geneious and Vector NTI on their desktops. How can the user seamlessly take sequences in GenomeQuest into these applications?

What Is a Plug-in?

In GenomeQuest, a plug-in is a sequence analysis program which

  • takes as input, sequences or data relating to sequences
  • performs an analysis on this data and
  • produces output which gives some insight into the sequence.

How to Plug-in a Server Side Application: PepWindow

The above definition, quite neatly matches with the three steps in creating a plug-in:

  1. Data Generation: This step involves formatting the sequence data input for the application.
  2. Processing: perform analysis on the data using the program and
  3. Rendering: display the results of the analysis on a GenomeQuest page.

Step 1: Understand the application

The first step is to understand the application specifically as it relates to the

  • format of input accepted
  • any additional parameters needed
  • type of output generated.

After downloading and installing pepwindow, I see that it takes fasta formatted protein sequences. For example, the file NP_958933.fa has the sequence of a 7-transmembrane protein.

>NP_958933
MTPQSLLQTT LFLLSLLFLV QGAHGRGHRE DFRFCSQRNQ THRSSLHYKP TPDLRISIEN 
SEEALTVHAP FPAAHPASRS FPDPRGLYHF CLYWNRHAGR LHLLYGKRDF LLSDKASSLL 
CFQHQEESLA QGPPLLATSV TSWWSPQNIS LPSAASFTFS FHSPPHTAAH NASVDMCELK 
RDLQLLSQFL KHPQKASRRP SAAPASQQLQ SLESKLTSVR FMGDMVSFEE DRINATVWKL 
QPTAGLQDLH IHSRQEEEQS EIMEYSVLLP RTLFQRTKGR SGEAEKRLLL VDFSSQALFQ 
DKNSSQVLGE KVLGIVVQNT KVANLTEPVV LTFQHQLQPK NVTLQCVFWV EDPTLSSPGH 
WSSAGCETVR RETQTSCFCN HLTYFAVLMV SSVEVDAVHK HYLSLLSYVG CVVSALACLV 
TIAAYLCSRR KPRDYTIKVH MNLLLAVFLL DTSFLLSEPV ALTGSEAGCR ASAIFLHFSL 
LTCLSWMGLE GYNLYRLVVE VFGTYVPGYL LKLSAMGWGF PIFLVTLVAL VDVDNYGPII 
LAVHRTPEGV IYPSMCWIRD SLVSYITNLG LFSLVFLFNM AMLATMVVQI LRLRPHTQKW 
SHVLTLLGLS LVLGLPWALI FFSFASGTFQ LVVLYLFSII TSFQGFLIFI WYWSMRLQAR 
GGPSPLKSNS DSARLPISSG STSSSRI

I can create the hydrophobicity plot of the above sequence as a PNG file using pepwindow with the following command:

pepwindow NP_958933.fa -graph png -goutfile hb_plot

This creates a PNG file called hb_plot.1.png shown below Rough Hydrophobicity Plot.png

I can smooth out the plot to show the 7 transmembrane regions more clearly if I change the window length that pepwindow uses to calculate the hydrophobicity. So, using the command:

pepwindow NP_958933.fa -graph png -goutfile hb_smooth -length 16

creates a PNG file called hb_smooth.1.png containing much nicer picture of the plot as shown below. Smooth Hydrophobicity Plot.png

So, on at least one parameter (window length) it would be helpful to allow user control. With the above we know:

  • Input format accepted is fasta. Sequence must be protein.
  • One user-defined parameter (window length) is needed.
  • Output is a single graphic file which must be displayed.

Armed with this, let us plug-in the application.

Step 2: Create the architecture

  • Create a directory for all the files related to this plug-in. By convention, the name of this directory starts with GqApp. I create a directory called GqAppHydrophobicity
mkdir GqAppHydrophobicity
cd GqAppHydrophobicity
  • Create the PHP script to tie in the overall steps. The best way to do this is to copy the file over from a plug-in that already exists (and works). For example, I copy over the PHP script from the IPStats plug-in directory:
cp GqAppIPstats/GqAppIPstats.php GqAppHydrophobicity.php
  • The contents of the PHP file (shown below) displays all the functions that need to be modified.
<?php
class GqAppIPstats extends ExportsAbstract {
 //
 /***************************************** Data generation step *****************************************/
 public function getStep1DataGenerationType()  {return DataFlowUtil::STEP1_DATA_GENERATION_PRINTF;}
 public function getStep1DataGenerationParam() {return '%DBPATHD/%RESFILE\n%PRE%OID,';}
 public function getStep1DataGenerationLimit() {return 300000;}
 //
 /***************************************** Processing step *****************************************/ 
 public function getStep2ProcessingType()      {return DataFlowUtil::STEP2_PROCESSING_SHELL;}
 public function getStep2ProcessingParam()     {return 'IPscript.sh';}
 //
 /***************************************** Rendering step *****************************************/
 public function getStep3RenderingType()       {return DataFlowUtil::STEP3_RENDERING_SMARTY;}
 public function getStep3RenderingParam()      {return 'IPstats.tpl';}
 //
 /***************************************** other functions *****************************************/  
 public function getContext()                  {return BiofacetContext::RESULT;}
 public function isEnabled()                   {return TRUE;}
 public function getTitle()                    {return 'Patent Statistics';}
 public function getCategory()                 {return 'Applications';}
 public function getDescription()              {return 'Patent Statistics';}
 public function getAdvancedOptions()          {return ;}
 public function getMimeType()                 {return ;}
 public function getAttachmentName()           {return ;}
}
?>
  • Name: First step is to change the name of the class from GqAppIPstats to GqAppHydrophobicity
  • Generate the data in the right format: Creating fasta formatted sequences is trivial in GenomeQuest using the printf utility. So, all we need to change is the parameter to send to the printf function. >%H#ID\n%S\n prints a header line with the identifier (H#ID) then a new line (\n) and then the sequence (S).

After these changes are made, the data generation step looks like:

<?php
class GqAppHydrophobicity extends ExportsAbstract {
 //
 /***************************************** Data generation step *****************************************/
 public function getStep1DataGenerationType()  {return DataFlowUtil::STEP1_DATA_GENERATION_PRINTF;}
 public function getStep1DataGenerationParam() {return '>%H#ID\n%S\n';}
 public function getStep1DataGenerationLimit() {return 1;}

Note that (in the last line above) we set the data generation limit to 1. This means that at most one sequence would be sent to the application (pepwindow renders only the top one sequence anyway). However, you can imagine that for other applications (say an assembler), this limit could be much higher. You can also return -1 from this function if there is no limit at all.

  • Specify the Processing script. Since we are going to run a shell command to execute the application, it makes sense to specify a shell script with an appropriate name. After these changes, the processing step looks like:
 /***************************************** Processing step *****************************************/
 public function getStep2ProcessingType()      {return DataFlowUtil::STEP2_PROCESSING_SHELL;}
 public function getStep2ProcessingParam()     {return 'Hydrophobicity.sh';}

Note that the shell script Hydrophobicity.sh is yet to be written (we will do this in the next section)

  • Specify the Rendering step.
 /***************************************** Rendering step *****************************************/
 public function getStep3RenderingType()       {return DataFlowUtil::STEP3_RENDERING_SMARTY;}
 public function getStep3RenderingParam()      {return 'Hydrophobicity.tpl';}

Note again that the smarty template for rendering the plot i.e. the script Hydrophobicity.tpl is yet to be written (we will do this below).

  • To conclude the PHP script, we modify the "Other" functions to look like:
 /***************************************** other functions *****************************************/
 public function getContext()                  {return BiofacetContext::SEQUENCE;}
 public function isEnabled()              {
   /* enable this plugin only if this is protein sequences */
   $instance_info = $this->getEnvironment('instance_info'); 
   if ($this->getEnvironment('biofacet_context') == BiofacetContext::SEQUENCE &&
        $instance_info['seq_type'] == 'PROTEIN')
          { return  TRUE;  }
   else
          { return  FALSE;  }
 }
 public function getTitle()                    {return 'Hydrophobicity Plot';}
 public function getCategory()                 {return 'Applications';}
 public function getDescription()              {return 'Kyte-Doolittle hydrophobicity plot';}
 public function getAdvancedOptions()     {
   $options = 'Window length: <input type="text" name="PEPWINDOW_LENGTH" value="7" />
'; return $options; } public function getMimeType() {return ;} public function getAttachmentName() {return ;}

The non-trivial changes we have made are to

  • specify where the function is enabled (only when viewing protein sequences) and
  • to allow an advanced parameter viz. the window length for graphing the hydrophobicity plot. Advanced parameters input can be specified using full formatting of html. The names given to the parameters here (PEPWINDOW_LENGTH) are available in the analysis shell script as we will see below.

This concludes the overall PHP script which ties the application into GenomeQuest. We still need to write

  • the shell script for the analysis and
  • the smarty template for rendering the results.

Step 3: Create a shell script to run the app

This is the heart of the analysis in the plug-in. It takes the input from GenomeQuest, runs the analysis and creates the output. For Hydrophobicity, I created a simple shell script called Hydrophobicity.sh (name specified above in the PHP code):

#!/bin/bash
#
# Path to the pepwindow script
PEPWINDOW="/usr/local/bin/pepwindow"
#
# Get a Random name for the graph PNG file to prevent file name collisions
TMP_PNG=$RANDOM
#
# Note 1: the PNG file will be named ${TMP_PNG}.1.png
# Note 2: $INFILE has the protein sequence(s) in fasta format - comes from teh Data Generation step
launch "PEPWINDOW RUN" $PEPWINDOW $INFILE -graph png -length $PEPWINDOW_LENGTH -goutfile ~gulu/public_html/Hydropathy_Plots/$TMP_PNG
#
# Put the name of temp file with the PNG picture into $OUTFILE.
# This is what will be known to the rendering step.
echo "http://laurent.genomequest.com/~gulu/Hydropathy_Plots/$TMP_PNG.1.png" > $OUTFILE

The above shell script is almost self explanatory. After specifying the path to pepwindow,

  1. We created a random name for the PNG file to ensure that multiple users using the utility at the same time will not run into file name collisions.
  2. We launched pepwindow
    1. using fasta formatted protein sequence in the special file $INFILE
    2. put a parameter to use the window length specified in the Advanced parameters and
    3. placed the output PNG file in a directory which accessible to the web browser (in this case my public_html directory).
  3. The name PNG file is printed into a special file called $OUTFILE so the rendering step can display it.
    1. In principle there is no bar to adding lots more information into this file. For example, I could do cat $INPUT >> $OUTPUT so that the entire input is available later in the rendering step. However, we choose to keep it simple for now.


Remember to make the shell script executable to all

Step 4: Create a template to render the results

Next step is to write the Hydrophobicity.tpl file to render the PNG file. It is a simple 3-line snippet of code:

{include file="header.tpl" title="GQ - Kyte-Doolittle hydrophobicity plot"}
<img src="{$application}" title="Hydrophobicity plot">
{include file="footer.tpl"}
  1. The first line includes or prints the GenomeQuest header (along with all the useful links at the top right).
  2. The second uses $OUTFILE from the previous step; this output file contents are now available as $application. Remembering that the name of the PNG file is available in this file, we put this file name into an IMG html tag.
    1. Note that any other html formatted output such as notes and title can be included here along with the picture - but here we choose to keep it simple.
  3. The third line includes or prints the GenomeQuest footer and concludes the page.

Next we just need to make GenomeQuest aware of it.

Step 5: Link into GenomeQuest

Link the GqAppHydrophobicity directory into GenomeQuest. The best way to do this is to create a symbolic link (using ln -s) to the system plugins exports directory on your system. On my sandbox test system, this command looks like:

ln -s GqAppHydrophobicity /path/to/GenomeQuest/web/GQ/system_plugins/Exports/

And we are done! Whenever the user is browsing a sequence database, they will see the Hydrophobicity application under the Applications menu at the top.

How to Plug-in a Client Side Application

When you plug-in a client side application, your users will be able to choose a few sequences on their GenomeQuest browser and will be able to export them in this format. Currently this is possible in GenomeQuest for Geneious and Vector NTI.

Plugging in a Client Side application is simply a matter of

  • exporting the selecting sequences in the proper format (like fasta, EMBL, GFF or any structured format such as XML - as expected by the client). Printf provides a way to generate any kind of format.
  • giving the file the right extension in the function getAttachmentName(), for example:
public function getAttachmentName() {
 return 'gq_records.gff';
}

.

Personal tools