Welcome to the GenomeQuest Documentation Wiki
How to plug a workflow into GenomeQuest
Before you begin, please read System Concepts in detail. This documentation assumes you have read and understood the GQ System Concepts.
Getting Started
Ok, you're ready to write your own workflow. Before you begin, you should have the following:
- You should be familiar with the GenomeQuest System Concepts. Really. Read this first.
- You may want to familiarize yourself with the GQ Engine Primer, although you can get through this document without it.
- You should have command line access to the head node that has GenomeQuest installed on it.
- You should have an application in mind.
- You should have tried to run an available workflow inside the GQ system already.
Why the fifth point? Because if you experience how workflows work inside of GQ, you will understand the Components of a Workflow that you need to build to fit into the system.
GenomeQuest workflows use the bash shell, PHP, and Smarty (a templating language that works closely with PHP) to interact with GenomeQuest. The workflow itself can be written in any language you prefer. Smarty, PHP, and bash are documented elsewhere, they are not covered in this document. For more information about Smarty, see http://www.smarty.net. For more information about PHP, see http://www.php.net. For more information about bash, see http://www.gnu.org/software/bash/bash.html.
We also use the Dojo Toolkit in our HTML to make our lives easier. This isn't necessary for you, but it's nice. You'll not need to worry too much about this, but you can learn more about Dojo at http://www.dojotoolkit.org/.
Familiarity with the basic concepts of web applications is also needed. For more information about these concepts, see http://www.webanddesigners.com/introduction-to-php-part-1 or http://devzone.zend.com/node/view/id/626.
The Components of a Workflow
Having familiarized yourself with the way GenomeQuest works, you know that from the users' experience, they go through a series of steps:
- They go to a launch page for your workflow
- They enter in some data describing which databases and which options they want
- They press the submit button
- The workflow gets launched
- They return to the MyGQ page and see a progress bar
- If they click on their workflow they can see more information about the input parameters they selected
- When the workflow is over, they can look at a report
- The report has links to other data and databases
For you, as the developer, you must therefore build the following files and functions:
- A submit page for your workflow:
submit.tpl - The workflow itself:
script.comparison.body.tpl - The progress bar updater - calling a function inside of
script.comparison.body.tpl - Information about the input parameters they selected:
info.tpl - The report page
report.tpl - Whatever other databases and "file droppings" you want to provide to the user
- You tie these all together with a master PHP file that plugs this into the system and makes GenomeQuest aware of your workflow.
GqWfYourWorkflow.php
Sample Workflow
To show you how this is done, we're going to walk you through the implementation of the Velvet workflow in this document.
You can also see all of the code pertaining to a Sample Blast Workflow that we built as a way to get started. There is no commentary here - just code - but it broadens the picture for you and gives you some code to copy.
We suggest for now that you continue here and then review the Sample Blast Workflow later.
Setting Up
Ok, we're going to create an example workflow together - we'll make the Velvet sequence assembly algorithm available to end users.
First step, I'll make a directory where I want to do my coding. I'll do this in my home directory and then I'll link this directory into the right place so GenomeQuest knows about it. (Note that in all examples we assume that $GQ_INSTALL is the root directory of the GenomeQuest installation. Also, I happen to be logged in as the user runner)
% cd % mkdir GqWfVelvetExample % ln -s /home/runner/GqWfVelvetExample $GQ_INSTALL/web/GQ/plugins/Workflows/GqWfVelvetExample % ls -ls $GQ_INSTALL/web/GQ/plugins/Workflows/ total 0 0 lrwxrwxrwx 1 runner geneit 45 Feb 10 13:41 GqWfChipseq -> /home/runner/heush/Services/trunk/GqWfChipseq 0 lrwxrwxrwx 1 runner geneit 30 Feb 10 13:42 GqWfVelvetExample -> /home/runner/GqWfVelvetExample
Great. As you can see, another workflow, GqWfChipseq is already here. We're ready to start our workflow.
The Foundation of a Workflow
Every workflow implements a GenomeQuest framework PHP abstract class in order to tell the system how it works. Some rules about this:
- The name of your class must begin with
GqWf, for example,GqWfVelvetExample - The class must be implemented in a PHP file with the same name, for example:
GqWfVelvetExample.php - The class must extend the GenomeQuest class called
ScriptWorkflowsAbstract
Ok, let's get started. I edit a new file called GqWfVelvetExample.php and insert the following:
<?php
class GqWfVelvetExample extends ScriptWorkflowsAbstract {
}
Required Functions
Inside the PHP class, you can do whatever you want, but a few functions must be implemented. In a moment I'll share them with you, along with links to more detail on each, but you can just keep reading this narrative because we'll implement each of these in line. The required functions are:
action_getLaunchPageThis renders the launch page for the workflow - the page that the user interacts with when trying to launch your workflow. That page will be built in a moment using standard HTML and the Smarty templating system. You'll need the templating system because some of the data you'll want to show the user is specific to them. Smarty is the mechanism you'll use to provide that data, and you'll do this inside of this function. For more information, see Creating A Launch Page.
generateScriptBodyOnce the user submits their information to launch the workflow, you have to receive that data, transform it however you want, and pass it to your workflow. You can write your workflow in any language (We'll write Velvet in PERL) but the workflow needs to be wrapped up in a BASH program. We'll again use Smarty templates in that BASH program to receive information. This function,generateScriptBody, is used to "fill in" your BASH template with the data that the user entered. For more information, see Creating the Workflow itself.
action_getReportWhen the workflow is done, the user will want his report. You can build it in any arbitrary format using HTML. Again, you'll use Smarty templates inside of the HTML to pass data between the system and your HTML report. For more information, see Creating Reports.
getCurrentVersionThis returns the current version of the workflow. This is useful if the workflow evolves and you need to track what version of the workflow generated a particular result.
getOutputsWorkflows produce "file droppings." For instance, you may want to produce an Excel file, or a GenomeQuest Engine Sequence Database, or anything in between. These should ideally be linked off the report you'll create for the user. The functiongetOutputsdescribes all of the outputs of your workflow - their names, MIME types, etc. For more information, see the detailed documentation on the getOutputs function.
getDefaultOutputNameThis specifies the default output that is displayed when the user views the result of a workflow. The most common OutputName is simply the string "report" - a dynamic report. We may augment this in the future. For more information, see the function documentation for getDefaultOutputName.
getTotalNbResultsThis specifies the total number of returned results.
So let's add these functions to our code.
<?php
class GqWfVelvetExample extends ScriptWorkflowsAbstract {
public function action_getLaunchPage(array $params) {
}
public function generateScriptBody() {
}
public function action_getReport(array $params) {
}
public function getCurrentVersion() {
return '1.0';
}
public function getOutputs() {
}
public function getDefaultOutputName() {
return 'report';
}
public function getTotalNbResults() {
return '1';
}
}
Registering Your Workflow
Now that you've got the basic class, you will already be able to see your workflow in the system. Two small steps to get there:
- update a global config file telling the system a bit about your workflow
- register your workflow via the GQ user interface
Here's how.
The Global Config File
GenomeQuest stores a global configuration file for all workflows. You need to update the configuration file to specify the title of your workflow. All of the parameters that you'd like to be able to toggle without editing code can also be placed in this configuration file. For instance, you may want to limit the number of sequences allowed to send through Velvet at any one time. Why put that in your code when you can put it in a configuration file?
The config file is global across the entire system. It is located at $GQ_INSTALL/web/GQ/config/plugin_config.txt.
Every new workflow must have an entry here with at the very least the title of the workflow defined. We add the following lines to the bottom of this file:
[gqwfvelvetexample:workflow] title = Velvet Example description = An Example Workflow to explain the GQ Workflow API
The syntax [gqwfvelvetexample:workflow] is critical: the tag gqwfvelvetexample must precisely match the name of your class, although you can see we remove all capital letters. The :workflow tag inherits other configurations that are generic to workflows. Indeed, at the very top of the configuration file you can see these defined:
[workflow] require_quota = true queue_name = all.q sge_app_queue = xapp
So our workflow will also have the ability to access all of these values. (For now, don't worry what they do.)
Registering Your Workflow
You've done everything you need to do to register your workflow. Log into GenomeQuest as a user with administrative rights, click the Administration Panel, then Workflows, and then click Register on your workflow.
If you refresh the page, you'll see your workflow available. Go ahead and try to launch a new VelvetExample!
Creating a Launch Page
If you tried to launch a new VelvetExample workflow, you were taken to a completely blank page. Why is this?
Remember, we have to implement the function action_getLaunchPage and right now it returns nothing. So the GenomeQuest platform calls this function, get's nothing, and sticks nothing inside of a legal HTML document.
Let's take the next step - trying to make a launch page that the user interacts with. For this Velvet example, we'll want to present the user with a list of databases that they can use in their assembly, as well as some parameters for the assembly.
There are two steps here:
- create a Smarty template to specify the presentation to the user.
- implement the the PHP function
action_getLaunchPageto "fill in" this template and return it.
Now - if you don't like the idea of Smarty and templates, you could embed all of the HTML right in the action_getLaunchPage function and just return it. But that's bad web coding. Let's do it right.
submit.tpl
The bulk of the HTML that you'll ultimately return from action_getLaunchPage will come from another file - submit.tpl. It's mostly HTML with some fancy templating going on. GenomeQuest automatically adds headers and footers so you just have to put the HTML code for the body of your form.
Here's my Smarty / HTML, in a few sections so I can explain to you what I'm doing.
This uses our GQ Form Tag Library to make form generation easier.
You're strongly advised just to copy this code in its entirety and put it in your submit.tpl file.
{extends file="full_width_layout.tpl"} {block name="title"}{$title}{/block} {block name="aftercss"} <style type="text/css"> /*Additional Styles*/ #mainbody { width: 65% !important; } </style> {/block} {block name="afterjs"} {* Any additional javascript to be injected into the final page *} {/block} {block name="body"} {breadcrumb} <h1>{$title}</h1> {gq_form} {gq_form_fieldset name="Inputs"} {gq_form_layout} {gq_form_input label="Result Name" foo="bar" name="title" required="true"}{/gq_form_input} {gq_form_textarea name="qdb" label="Query Nucleotide Sequence"}{/gq_form_textarea} {gq_form_input label="Number" name="number" constraints="{ min:0,max:120 }"}{/gq_form_input} {gq_form_input type="checkbox" label="First check" name="dfpeter"}{/gq_form_input} {gq_form_select name="selectme" label="Select Me!" options=["key1" => "key1 value", "key2" => "key2 value"]} {/gq_form_select} {* This example shows when you want to do your own custom work in the field area, good for groups of elements *} {gq_form_row label="whatever"} {gq_form_input type="radio" label="First check" name="dfpeter" doLayout="false"}{/gq_form_input} {gq_form_input type="checkbox" label="Second check" name="dfpeter" doLayout="false"}{/gq_form_input} {gq_form_input type="checkbox" label="Third check" name="dfpeter" doLayout="false"}{/gq_form_input} {gq_form_input type="checkbox" label="Fourth check" name="dfpeter" doLayout="false"}{/gq_form_input} {/gq_form_row} {/gq_form_layout} {/gq_form_fieldset} {/gq_form} {/block}
Implementing action_getLaunchPage
Now that we've got a submit page ready, we need to "fill in the template" inside of action_getLaunchPage.
For now, let's not pass in any variables, let's just fill it in and see what happens. Inside of our PHP class, GqWfVelvetExample.php, we change the action_getLaunchPage function as follows:
public function action_getLaunchPage(array $params) {
return $this->fillDisplayTemplate(dirname(__FILE__)."/submit.tpl");
}
In this context, $this is our PHP class, which is implementing the abstract class ScriptWorkflowsAbstract. GenomeQuest provides the function fillDisplayTemplate inside of that class, which knows how to fill in a Smarty template. Since the submit.tpl file and this class are in the same directory, we simply point the function at our newly created template file.
How does it look?
Not bad, except that all of the default fields aren't filled, and there are no databases to choose from. That's because we need to fill in those variables.
public function action_getLaunchPage(array $params) {
// physical sequence database list
$dbs = PluginSeqdbUtil::getPhysicalDbList(array(SeqdbFilterExp::OP_AND,
array(SeqdbFilterExp::COL_SEQDB_LOCALE, SeqdbFilterExp::COM_EQ,'L'),
array(SeqdbFilterExp::COL_SEQ_TYPE, SeqdbFilterExp::COM_EQ, SeqType::NUC)),
FALSE);
if (is_array($dbs)) {
uasort($dbs, create_function('$a, $b', 'return strcmp($a["title"], $b["title"]);'));
}
$this->addEnvironment('nuc_db_list', $dbs);
// virtual db list
$vdbs = PluginSeqdbUtil::getVirtualDbList(array(SeqdbFilterExp::COL_SEQ_TYPE, SeqdbFilterExp::COM_EQ,SeqType::NUC),FALSE);
if (is_array($vdbs)) {
uasort($vdbs, create_function('$a, $b', 'return strcmp($a["title"], $b["title"]);'));
}
$this->addEnvironment('nuc_vdb_list', $vdbs);
// Other params
$this->addEnvironment('HASH_LENGTH', 21);
$this->addEnvironment('HASH_LENGTH_MIN', 5);
$this->addEnvironment('HASH_LENGTH_MAX', 31);
$this->addEnvironment('MIN_COVERAGE', '');
$this->addEnvironment('MAX_COVERAGE', '');
$this->addEnvironment('MIN_CONTIG_LENGTH', '');
return $this->fillDisplayTemplate(dirname(__FILE__)."/submit.tpl");
}
What's going on here? We're asking the GenomeQuest framework for a list of all of the physical and virtual databases that are available to the user through the following two functions:
-
PluginSeqdbUtil::getPhysicalDbList -
PluginSeqdbUtil::getVirtualDbList
The parameters to these functions are extensive and documented here. We provide you significant control over the lists of databases you will allow the user to see, for instance based on average, minimum, and maximum sequence lengths, which type of sequencing machine, etc. For now, suffice it to say that we have provided the Smarty template with all physical and virtual nucleotide databases in this function.
The other piece of work we're doing is sending in values for the other variables we defined in the Smarty template.
Implementing action_getLaunchPage using the config file
We can do better. We've hard coded all these values into our function, but there is a config file built just for this, so an administrator can easily make changes.
Recall, the global configuration file is available at $GQ_INSTALL/web/GQ/config/plugin_config.txt. Let's put these values in there:
[gqwfvelvetexample:workflow] title = Velvet Example description = An Example Workflow to explain the GQ Workflow API hash_length = 21 hash_length_min = 5 hash_length_max = 31 min_coverage = max_coverage = min_contig_length =
Now we have to know how to pull these values from inside the main PHP class using the function $this->getConfig():
// Other params
$this->addEnvironment('HASH_LENGTH', $this->getConfig('hash_length'));
$this->addEnvironment('HASH_LENGTH_MIN', $this->getConfig('hash_length_min'));
$this->addEnvironment('HASH_LENGTH_MAX', $this->getConfig('hash_length_max'));
$this->addEnvironment('MIN_COVERAGE', $this->getConfig('min_coverage'));
$this->addEnvironment('MAX_COVERAGE', $this->getConfig('max_coverage'));
$this->addEnvironment('MIN_CONTIG_LENGTH', $this->getConfig('min_contig_length'));
So the overall action_getLaunchPage function is as follows:
public function action_getLaunchPage(array $params) {
// physical sequence database list
$dbs = PluginSeqdbUtil::getPhysicalDbList(array(SeqdbFilterExp::OP_AND,
array(SeqdbFilterExp::COL_SEQDB_LOCALE, SeqdbFilterExp::COM_EQ,'L'),
array(SeqdbFilterExp::COL_SEQ_TYPE, SeqdbFilterExp::COM_EQ, SeqType::NUC)),
FALSE);
if (is_array($dbs)) {
uasort($dbs, create_function('$a, $b', 'return strcmp($a["title"], $b["title"]);'));
}
$this->addEnvironment('nuc_db_list', $dbs);
// virtual db list
$vdbs = PluginSeqdbUtil::getVirtualDbList(array(SeqdbFilterExp::COL_SEQ_TYPE, SeqdbFilterExp::COM_EQ,SeqType::NUC),FALSE);
if (is_array($vdbs)) {
uasort($vdbs, create_function('$a, $b', 'return strcmp($a["title"], $b["title"]);'));
}
$this->addEnvironment('nuc_vdb_list', $vdbs);
// Other params
$this->addEnvironment('HASH_LENGTH', $this->getConfig('hash_length'));
$this->addEnvironment('HASH_LENGTH_MIN', $this->getConfig('hash_length_min'));
$this->addEnvironment('HASH_LENGTH_MAX', $this->getConfig('hash_length_max'));
$this->addEnvironment('MIN_COVERAGE', $this->getConfig('min_coverage'));
$this->addEnvironment('MAX_COVERAGE', $this->getConfig('max_coverage'));
$this->addEnvironment('MIN_CONTIG_LENGTH', $this->getConfig('min_contig_length'));
return $this->fillDisplayTemplate(dirname(__FILE__)."/submit.tpl");
}
That's it! Your launch page is complete.
Tips and Tricks on Launch Pages
So you get the basics. You have to make an HTML template, and you have to implement a function to fill it in with whatever values you want.
But there's some magic you might not know about.
Creating a Title for the Workflow Run
When the user presses the submit button, you'll want to map the name of the run that he or she gave to the title of their workflow run. Whichever form element in submit.tpl you provide the user for the title of the run, be sure to give it the HTML ID of title. You'll notice that's what we did for the VelvetExample workflow:
{* Let's get the run name and insert a default run name using the Smarty date function *}
<tr>
<td><label>Run name:</label></td>
<td><input dojoType="dijit.form.TextBox" name="title" id="title" value="Velvet {$smarty.now|date_format:"%d/%b/%Y"}"></td>
</tr>
Just by naming it title, the GenomeQuest framework will automatically name the workflow result for you. It's special handling for this input tag.
This title will be accessible later for you via a very special suite of functions embedded inside of your $this object, namely $this->aDbmsWorkflow->getTextLabel(). More on that soon.
Magic database names
Just like the title field, if you specify the input name for a query database as either qdb_id or qdb_def_id, the database will be automatically prepared in the working directory of the workflow. Similarly, if you specify the input name for a subject database as either sdb_id, or sdb_def_id, the database will also be automatically prepared in the working directory of the workflow. You can immediately use these databases in your script, without any additional preparation.
Indeed, you can see in the code we wrote to show the list of databases, we named the SELECT box qdb_id.
<select dojoType="gq.form.Select" id="qdb_id" name="qdb_id">
<optgroup label="Uploaded Nucleotide">
{* Here we use Smarty's foreach construct to iterate through these $nuc_db_list and $nuc_vdb_list variables.
Remember, we'll populate these variables inside the PHP class. *}
{foreach from=$nuc_db_list item=item key=key}
<option value="{$item.id}">{$item.title}</option>
{/foreach}
</optgroup>
<optgroup label="Virtual Nucleotide">
{foreach from=$nuc_vdb_list item=item key=key}
<option value="{$item.id}">{$item.title}</option>
{/foreach}
</optgroup>
</select><br/>
If you don't totally get this yet, it's ok. More on it as we talk about the next step - actually building and launching your workflow.
Throwing Errors
If you don't like some parameter that the user has entered, you can throw an error back at them in one of two functions inside of your PHP class:
-
checkSeqdbInputs -
checkLaunchParams
Both of these functions are called just after the user has clicked the submit button. Inside of them you have access to the users' selections. So for instance, I could define a function in VelvetExample as follows:
public function checkLaunchParams($params) {
$pec = new ParameterCheckerErrorContainer();
if ($this->params->min_contig_length < 5) {
$pec->addError(new ParameterCheckerError('min_contig_length', "The minimum contig length should be at least 5.", ParameterCheckerError::SEVERITY_ERR));
}
return $pec;
}
A few things about this:
- $this->params has all of the parameters you built in your form, by their HTML id.
- If you want to throw an error nicely, use the above syntax. Make a new
ParameterCheckerErrorContainer(), add an error (you can add many), and then return it. The system will not only present the error nicely to the user on the launch page, but it will also highlight the fields that need more help.
This same game can be played with checkSeqdbInputs.
Creating a Title for the Launch Page
All HTML pages have titles - the things at the top of the browser window, typically above the URL. We didn't set a title in our example above, but you can do so via a function called setTemplateTitle. For instance, somewhere in the action_getLaunchPage function:
$this->setTemplateTitle("Velvet Example");
Getting Information about the Current User
You can get lots of information inside of your action_getLaunchPage function - or anywhere in your class, for that matter. Try the function
$this->getEnvironment(), or
$this->getAllEnvironment()
For instance,
-
getEnvironment('current_user')returns an array that contains information about the current user, including such things as name, email address, groups, and so on.
Your workflows and the URL API
Did you know that your workflow is now totally accessible through the GenomeQuest URL API? This URL displays the launch page for my VelvetExample:
query?do=gqplugin&plugin=Workflows.GqWfVelvetExample&plugin_action=getLaunchPage
And if no action is specified, the default is &plugin_action=getLaunchPage, so this URL is equivalent:
query?do=gqplugin&plugin=Workflows.GqWfVelvetExample
Creating The Workflow Itself
We've built a launch page, and now it's time to build the underlying workflow. Remember in the Required Functions section, we described that your PHP class had to implement at least five functions. The function that drives the actual running of the workflow itself is called
generateScriptBody
Just as our action_getLaunchPage function passed values into a Smarty template to render HTML, you'll use a Smarty template again to pass values, but this time, into a BASH script which will encapsulate your workflow.
As standard practice, we create a BASH script and name it script.comparison.body.tpl. Let's begin!
Creating script.comparison.body.tpl
In my code directory, I create a file called script.comparison.body.tpl and edit it:
# Velvet de novo assembly [[$comp_environment.perl_bin_path]] -I[[$plugin_dir]] [[$plugin_dir]]/run_velvet.pl [[$working_dir]]
You'll notice a few things. First, Smarty syntax here is different than in the launch page development. In submit.tpl, you substituted variables using { } separators. Here, the same effect is achieved using [[ ]] separators.
Second, you might infer that my BASH script is really just a launching pad for a PERL program I'll write that does all of the pipelining stuff on the back end. That's just me - I like PERL. You can use whatever language you want to write your workflows, but it does have to start as a BASH script calling your program.
Let's break this line of code down a bit.
[ [$comp_environment.perl_bin_path] ]
This is the full path to wherever PERL is installed on the system.
-I[ [$plugin_dir] ]
This is the directory where our workflow code actually lives. You can see if you know PERL that I'll be building a PERL library that I'll want to include in my run_velvet.pl program, which is which I use the -I[ [$plugin_dir] ] option for PERL.
[ [$plugin_dir] ]/run_velvet.pl
Our program itself - this will run the workflow.
[ [$working_dir] ]
Remember, every time your workflow is run, a new directory in Userdata is created and all of your workflow's file droppings and activities will happen in here. This is dynamic and provided by the GenomeQuest framework, so you have to provide this location to your workflow inside the BASH script.
Defining generateScriptBody
Now, back to our Master PHP class. Let's define generateScriptBody as follows:
public function generateScriptBody() {
return $this->fillScriptTemplate(dirname(__FILE__) . "/script.comparison.body.tpl");
}
Just like with our action_getLaunchPage function, we simply use the templating system to "fill in" the script.comparison.body.tpl file. The system will fill it in, and then automatically execute the filled-in template. So our workflow can launch!
But wait.
In our script.comparison.body.tpl, we referred to a few variables:
- $comp_environment.perl_bin_path
- $plugin_dir
- $working_dir
Why didn't we use a similar syntax as with action_getLaunchPage and actually pass these variables in via statements like:
$this->addEnvironment('comp_environment.perl_bin_path', '/usr/local/bin/perl');
The reason is that the system automatically provides a whole series of variables to the BASH script without you having to pass any of them through.
To see what they are, let's actually try to run our workflow right now. We know it'll break because the first and only thing the workflow does is call velvet.pl which we haven't written yet. But taking a walk through a $working_dir will illuminate the dark.
Running the workflow - even though it'll break
So, we go into the GenomeQuest user interface, choose our Velvet Example workflow, and launch a new workflow. The workflow seems to launch successfully, and then I am returned to this screen.
Notice that my run is listed as failed. Nice. The system picked it up. Now how can I go to the directory where it tried to run and failed? Click the view log link and here is what you see:
#################### log ################### Copyright (c) GenomeQuest 2010 - All rights reserved user login id: "resnick@genomequest.com" workflow working directory: "/disk/GQtest/data/GQdata/userdata/09080407051444a7815/workflow/1779/" workflow type: "GqWfVelvetExample" [ulimit -n 16384] [CREATION TIME] 2010/02/11 07:38:41 [start SCRIPT] 2010/02/11 07:38:41 [pid SCRIPT] 10070 [start SCRIPT BODY] 2010/02/11 07:38:41 [pid SCRIPT BODY] 10078 [FAILED SCRIPT BODY] 2 [start PHP POSTPROCESSING] 2010/02/11 07:38:41 [pid PHP POSTPROCESSING] 10083 [stop PHP POSTPROCESSING] 2010/02/11 07:38:41 [stop SCRIPT] 2010/02/11 07:38:41 #################### stdout ################### #################### stderr ################### Can't open perl script "/home/runner/GqWfVelvetExample//run_velvet.pl": No such file or directory #################### array lastjob ################### #################### script body ################### # Velvet de novo assembly /usr/bin/perl -I/home/runner/GqWfVelvetExample/ /home/runner/GqWfVelvetExample//run_velvet.pl /disk/GQtest/data/GQdata/userdata/09080407051444a7815/workflow/1779/
Notice! The log shows us exactly what the working_directory is. Let's go in there and take a look at what the GenomeQuest framework left behind.
Files Provided by GenomeQuest Framework in the Working Directory
Our workflow failed as expected, but a $working_dir was still created and we know what it is through the GenomeQuest user interface log. Let's have a look in this directory:
% cd /disk/GQtest/data/GQdata/userdata/09080407051444a7815/workflow/1779/ % ls -ls total 44 4 -rw-r--r-- 1 runner geneit 190 Feb 11 07:38 body.script.comparison.sh 0 -rw-r--r-- 1 runner geneit 0 Feb 11 07:38 comparison.has.failed 4 -rw-r--r-- 1 runner geneit 586 Feb 11 07:38 comparison.log 4 -rw-r--r-- 1 runner geneit 664 Feb 11 07:38 query.db.ind 8 -rw-r--r-- 1 runner geneit 4586 Feb 11 07:38 script.comparison.sh 8 -rw-r--r-- 1 runner geneit 6345 Feb 11 07:38 script.comparison.sh.params 8 -rw-r--r-- 1 runner geneit 4846 Feb 11 07:38 script.comparison.sh.params.json 4 -rw-r--r-- 1 runner geneit 6 Feb 11 07:38 script.comparison.sh.pid 4 -rw-r--r-- 1 runner geneit 98 Feb 11 07:38 script.comparison.sh.stderr 0 -rw-r--r-- 1 runner geneit 0 Feb 11 07:38 script.comparison.sh.stdout
body.script.comparison.sh
Inside of this file we see our filled-in BASH template:
# Velvet de novo assemly /usr/bin/perl -I/home/runner/GqWfVelvetExample/ /home/runner/GqWfVelvetExample//run_velvet.pl /disk/GQtest/data/GQdata/userdata/09080407051444a7815/workflow/1779/
script.comparison.sh
This is a wrapper shell script which calls our body.script.comparison.sh file. It is provided by the GQ framework. It provides some helpful functions to you if you want to write your entire workflow inside of body.script.comparison.tpl. For instance, it gives you access to a function that allows you to automatically update the progress bar inside of GQ. It also tells the GenomeQuest framework when your workflow is finished and whether it worked well. You can see a sample of this file here.
script.comparison.sh.params
This is a very important file, combined with its counterpart, script.comparison.sh.params.json. This file contains all of the information that the GenomeQuest framework knows and can provide to your workflow. The entire file is available here but some important snippets are shown below:
[comp_environment] => Array
(
[gq_path] => /disk/GQtest/
[nb_threads] => 1
[biofacet_bin_path] => /disk/GQtest/data/GQdata/biofacet/GENE_IT/bin/
[biofacet_matrix_path] => /disk/GQtest/data/GQdata/biofacet/GENE_IT/mat/
[biofacet_license_path] => /disk/GQtest/data/GQdata/biofacet/lic/
[biofacet_lib_path] => /disk/GQtest/data/GQdata/biofacet/GENE_IT/lib/
[awk_bin_path] => gawk
[join_bin_path] => join
[php_bin_path] => /disk/GQtest/data/GQdata/sapi/php_cli
[du_bin_path] => du
[perl_bin_path] => /usr/bin/perl
[application_home_path] => /disk/GQtest/web/GQ/
[gqadmindb_path] => /disk/GQtest/data/GQdata/external_apps/gqadmindb/
[content_hotdrive_path] => /disk/GQtest/data/GQdata/content/hotdrive/
[content_local_path] => /disk/GQtest/data/GQdata/content/local/
[user_tmp_dir] => /disk/GQtest/data/GQdata/userdata/09080407051444a7815//.tmp/
[path_sge_bin_dir] => /opt/sge-6_2u2/bin/lx24-amd64
[sge_queue] => all.q
[sge_root] => /opt/sge-6_2u2/bin/lx24-amd64/../..
[sge_cell] =>
[queue_settings_sh] =>
)
Remember how in our BASH script we referred to [ [$comp_environment.perl_bin_path] ]? Here it is. Indeed, paths to all of the important places you need to refer to in your workflow are available here.
We also referred to the variable [ [$plugin_dir] ] and the variable [ [$working_dir] ]. These are also available in this file:
[plugin_dir] => /home/runner/GqWfVelvetExample/
[working_dir] => /disk/GQtest/data/GQdata/userdata/09080407051444a7815/workflow/1779/
Notice even more, all of the parameters that came from the user's launch page are available:
[params] => Array
(
[dataset_type] => short
[paired_end] => 0
[ins_length] =>
[hash_length] => 21
[min_coverage] =>
[max_coverage] =>
[min_contig_lgth] =>
[qdb] => id:5800
[qdb_seq_type] => nucleotide
[title] => Test of Velvet Example
[description] =>
[email] =>
)
So you have everything you need in this file to write your workflow - all paths, parameters, and more.
The problem is - inside of the script.comparison.body.tpl function you can refer to these variables directly using the [[ ]] syntax. But we're writing a PERL program, so we can't get access to them easily. That's why the framework also provides these variables in a parsable format: script.comparison.sh.params.json.
script.comparison.sh.params.json
This file puts all of the parameters available in script.comparison.sh.params in a parsable JSON format. For more information on this format, see http://www.json.org. PERL has a great module to handle JSON format and turn it into PERL objects - it's called JSON.pm and is available at CPAN.org. Virtually every meaningful programming language can handle this format.
You can see the contents of this file here.
script.comparison.sh.pid
This file simply stores the process id of the body.script.comparison.sh shell process that gets kicked off to run your workflow. Its contents in this case?
% cat script.comparison.sh.pid 10070
script.comparison.sh.stdout
Any output that your workflow produces on STDOUT will be placed in this file. It will also be displayed in the log inside the GQ user interface.
script.comparison.sh.stderr
Any output that your workflow produces on STDERR will be placed in this file. It will also be displayed in the log inside the GQ user interface. In our case, you won't be surprised to see the contents of this file:
% cat script.comparison.sh.stderr Can't open perl script "/home/runner/GqWfVelvetExample//run_velvet.pl": No such file or directory
Remember, we kicked off our workflow and the only thing our BASH script does is call a currently unbuilt PERL workflow.
comparison.log
This file is a log file that captures the output of the calling program - script.comparison.sh. Since we'll be writing our workflow in PERL, there won't be a lot of output in here, but if you were to write your entire workflow in BASH via the body.script.comparison.tpl template and used the functions available in script.comparison.sh you would see more logging happening in this file.
comparison.X.X
A 0 byte file will always exist inside of your $working_dir with one of three possible names:
comparison.has.failedcomparison.has.finishedcomparison.is.running
In our case, we see the file comparison.has.failed because, indeed, our program returned a non-zero exit status.
query.db.ind
This is the GenomeQuest Engine (biofacet) query database that the system magically provided for us because we set up one of our parameters in the launch page as qdb_id. For more on this, see Magic Database Names. This will be very useful because this GenomeQuest engine database has all of the sequences that we'll run through Velvet. More on that soon.
Writing Actual Workflow Code
Now we have everything we need to get started writing run_velvet.pl. We have all of the parameters from the system, we know where our working directory is, and we can get started.
Velvet Binaries
In our case, we need the Velvet binaries to be installed. I've done that:
% ls -ls total 40 4 -rw-r--r-- 1 runner geneit 1910 Feb 11 07:33 GqWfVelvetExample.php 4 -rw-r--r-- 1 runner geneit 128 Feb 11 07:22 script.comparison.body.tpl 12 -rw-r--r-- 1 runner geneit 8571 Feb 10 17:06 submit.tpl 4 drwxr-xr-x 3 runner geneit 4096 Feb 11 09:19 velvetbin % ls -ls velvetbin/ total 992 312 -rwxr-xr-x 1 runner geneit 312503 Feb 11 09:19 velvetg 140 -rwxr-xr-x 1 runner geneit 136872 Feb 11 09:19 velveth
GQWorkflowFramework.pm
Next, I'll build a PERL library to load in JSON files and handle other obvious things. I'll call it GQWorkflowFramework.pm. Feel free to steal this.
#!/usr/bin/perl -w
use strict;
package GQWorkflowFramework;
use vars qw(@ISA @EXPORT);
use vars qw($WF_PARAM_FILE $PLUGIN_DIR $WORKING_DIR $PROGRESS_FILE $LOG
$LSPDB $LSPRES);
use JSON;
use Date::Manip;
require Exporter;
@ISA = qw(Exporter);
@EXPORT = qw(parse_json_params update_progress initstep endstep log
$PLUGIN_DIR $LOG $LSPDB $LSPRES); # symbols to export on request
$WF_PARAM_FILE = "script.comparison.sh.params.json";
sub parse_json_params {
my($array_as_string) = @_;
my($array);
$array = from_json($array_as_string);
# Save a few things for later.
$PLUGIN_DIR = $array->{plugin_dir};
$WORKING_DIR = $array->{working_dir};
$PROGRESS_FILE = "$WORKING_DIR/" . $array->{progress};
$LOG = "$WORKING_DIR/" . $array->{log};
$LSPDB = $array->{comp_environment}->{biofacet_bin_path} . "/lspdb";
$LSPRES = $array->{comp_environment}->{biofacet_bin_path} . "/lspres";
$ENV{BIOFACET_BIN_PATH} = $array->{comp_environment}->{biofacet_bin_path};
$ENV{LSPMAT} = $array->{comp_environment}->{biofacet_matrix_path};
$ENV{GENE_IT_LICENSE_FILE} = $array->{comp_environment}->{biofacet_license_path};
if (defined $ENV{LD_LIBRARY_PATH}) {
$ENV{LD_LIBRARY_PATH} = $array->{comp_environment}->{biofacet_lib_path} . ":" . $ENV{LD_LIBRARY_PATH};
}
else {
$ENV{LD_LIBRARY_PATH} = $array->{comp_environment}->{biofacet_lib_path};
}
return $array;
}
sub update_progress {
my($overall,$incremental) = @_;
$incremental="" unless defined($incremental);
open(FH,">$PROGRESS_FILE") || die $!;
print FH "[$incremental] $overall\n";
close(FH);
}
sub initstep {
my($step) = @_;
open(FH,">>$LOG") || die $!;
print FH "[start $step] ", UnixDate("now","%Y/%m/%d %H:%M:%S"), "\n";
close FH;
}
# The "check" function isn't necessary like it is in shell because our processes
# here will be enclosed by Perl. Call this function just for nice logging.
sub endstep {
my($step) = @_;
open(FH,">>$LOG") || die $!;
print FH "[stop $step] ", UnixDate("now","%Y/%m/%d %H:%M:%S"), "\n\n";
close FH;
}
sub log {
my($info) = @_;
open(FH,">>$LOG") || die $!;
print FH $info;
close FH;
}
1;
A note on this library. I've built a few functions that I'll use in run_velvet.pl. They are:
initstep
I'll call this function just before I do any "step" of the workflow. It'll print some nice things out into the LOG file for me. Notice that the LOG file is the same one that the system wants me to use since I parsed the JSON file and set up the LOG to be:
$array = from_json($array_as_string);
$WORKING_DIR = $array->{working_dir};
$LOG = "$WORKING_DIR/" . $array->{log};
endstep
This is just like initstep except I'll call it at the end of a "step" - however I define a step to be inside run_velvet.pl
update_progress
GenomeQuest will provide a progress bar in the graphical user interface that you can have control over. We provide a function in script.comparison.sh that you can use, but inside of PERL we don't have access to this. So I rewrote it. It's very simple. For more on updating the progress bar, see Creating the Progress Bar.
run_velvet.pl
The actual workflow itself is written in PERL. You can take a look at it: run_velvet.pl, 269 lines of PERL code.
Even if you don't want to look at this code, there are a few major steps of the workflow to draw your attention to:
Basic Setup
initstep("BASIC SETUP");
# Figure out whether we have a bfql statement to deal with (virtual databases)
$BFQL = "";
if (defined $PARAMS->{qdb_bfql} and ($PARAMS->{qdb_bfql} =~ /\S/)) {
$BFQL = "-bfql " . $PARAMS->{qdb_bfql};
}
log("BFQL string is: '$BFQL'\n");
update_progress(2);
QueryStats(); #side effect - assigns vals to global variables $READ_COUNT, $RESIDUE_COUNT, and $MAX_READ_LENGTH;
log(sprintf("READ_COUNT: %d\nRESIDUE_COUNT: %d\nMAX_READ_LENGTH: %d\n",$READ_COUNT,$RESIDUE_COUNT,$MAX_READ_LENGTH));
# Make the directory where most of the velvet work is done.
mkdir "$working_dir/$OUTPUT_DIR";
update_progress(3);
endstep("BASIC SETUP");
A few notes - you can see we are using those functions initstep, endstep and update_progress freely in here.
Also, there is this funny thing going on with this $BFQL variable. You should know that BFQL is the GenomeQuest Engine Query Language (BioFacet Query Language) and there's plenty of documentation available on it here. The reason why we are using this here is because - you may recall - we allowed the user to select either a Physical or a Virtual database. Well, a Virtual database is essentially a database with a BFQL filter on it (for instance, Genbank ESTs where gene name = AACK). So if the user specified a Virtual database as their input to our workflow, we need to get the BFQL string from the platform and use it inside our workflow to assemble only the right sequences.
Indeed, to this point - the BFQL string is one of those parameters that isn't available inside of script.comparison.sh.params.json. So here's an example of a variable that we need to fill in via the PHP master class. Let's do that now:
public function generateScriptBody() {
$qBfql = $this->aDbmsWorkflow->getDbBfql('query');
$this->addEnvironment('qdb_bfql', $qBfql);
return $this->fillScriptTemplate(dirname(__FILE__) . "/script.comparison.body.tpl");
}
Now that we've done this, the variable qdb_bfql will be available in the script.comparison.sh.params.json file.
Create FASTA input
The next step is to create FASTA input:
sub CreateFastaInput {
system("$LSPDB $working_dir/query.db $BFQL -printf '>%H#ID\n%S\n%VOID' > $working_dir/$FASTA_INPUT") == 0
or die $?;
}
Fancy stuff - we're using the GenomeQuest Engine on our query database, which is provided to us in a GenomeQuest Engine format because we set up the launch page to call the user's selection qdb_id. Remember that magic.
So we run the lspdb command on our database, with whatever appropriate BFQL filter is necessary. This command dumps out the database on the command line, and we take that opportunity to reformat it with:
-printf '>%H#ID\n%S\n%VOID'
This prints the > followed by the ID of the sequence (%H#ID) followed by a newline (\n) followed by the sequence (%S) followed by a newline (\n). The %VOID ensures that there is a final newline at the end of the entire dump. Check the docs for lspdb for more on this. There is also a significant discussion of -printf in the -printf section of the GQ Engine Primer.
Run Velveth
Let's kick off Velveth on our new FASTA file:
sub VelvetH {
my($VELVETH);
my($readtype);
$readtype = $PARAMS->{params}->{dataset_type}; # If not solid, we know that dataset_type is either "short" or "long"
if ($PARAMS->{params}->{paired_end}) {
$readtype .= "Paired";
}
$VELVETH = "$PLUGIN_DIR/velvetbin/velveth $working_dir/$OUTPUT_DIR $PARAMS->{params}->{hash_length} -fasta -$readtype $working_dir/$FASTA
_INPUT";
}
# Off we go...
log("$VELVETH\n");
system("$VELVETH 2>&1 >> $LOG");
}
Notice how we use the dataset_type field which has the exact same name in the launch page. Check back to submit.tpl to confirm this. We are directly using the user's input here to determine what the parameters of the velveth run should be.
Run Velvetg
The next step in the Velvet workflow is to run velvetg:
sub VelvetG {
my($VELVETG);
$VELVETG = sprintf("%s/velvetbin/velvetg %s/%s -read_trkg yes -cov_cutoff '%s' -max_coverage '%s' -min_contig_lgth '%s'",
$PLUGIN_DIR,
$working_dir,
$OUTPUT_DIR,
defined $PARAMS->{params}->{min_coverage} ? $PARAMS->{params}->{min_coverage} : '',
defined $PARAMS->{params}->{max_coverage} ? $PARAMS->{params}->{max_coverage} : '',
defined $PARAMS->{params}->{min_contig_length} ? $PARAMS->{params}->{min_contig_length} : '',
$PARAMS->{params}->{paired_end} ? "-ins_length $PARAMS->{params}->{ins_length}" : '');
# Off we go...
log("$VELVETG\n");
system("$VELVETG 2>&1 >> $LOG");
}
Again, notice how we directly use the parameters defined by the user in the launch page.
Load the Contigs into GenomeQuest
Velvetg produces a file of contigs. Normally it does this in a FASTA format but we changed the code a tiny bit so it would output the contigs in an EMBL+ format with some additional annotation about each contig. So now we can use GenomeQuest's Content Manager to convert this EMBL file into the GenomeQuest format and place it inside of the GenomeQuest application. See Publishing a Sequence Database to the GenomeQuest Front End for more.
Later on we'll link to this database from the Velvet report page.
sub AdminDB {
my $ADMIN_DB = sprintf("%s/admin_db.pl --action add --gq_base_dir %s --db_file %s/%s/contigs.fa --db_id VELVET_%s_CONTIGS --db_format EMBL+ --db_type NUC --db_name '%s' --release 1 --gq_fields ':ALL' --index_fields ':ALL' --owner '%s' --access 'container.workflow.%s'",
$PARAMS->{comp_environment}->{gqadmindb_path},
$PARAMS->{comp_environment}->{gq_path},
$working_dir, $OUTPUT_DIR,
$PARAMS->{dbms_workflow_id},
$PARAMS->{params}->{title},
$PARAMS->{current_user}->{login_name},
$PARAMS->{dbms_workflow_id});
log("$ADMIN_DB\n");
system("$ADMIN_DB 2>&1 >> $LOG");
}
You should have some good idea of how admin_db.pl works - it's the key tool to load GenomeQuest sequence databases up into the metadata layer so they can become available to users. (If you don't, read the Content Manager Reference Manual.) So here I'll point out only two things:
- We're loading the file contigs.fa into the system, as you can see.
- Normally you would associate an access control of "private" with this file so the user could have the database in their account. But this time, because the database is actually a by-product of this workflow, we must be more careful. We set the access to
container.workflow.%sand substitute in the workflow id for %s. This means that the database is "contained by" the workflow, and that has the benefit that when a user shares the workflow result with someone, the recipient also gets access to this database.
Generate Statistics
Later on we're going to build a report on this run. To do that, we have to have some statistics readily available to provide to the user. We'll use a standard Zend INI format (http://framework.zend.com/manual/en/zend.config.adapters.ini.html) which is basically key-value pairs separated by newlines.
sub GenerateStats {
my($line);
my($nb_contigs,$nb_assembled_reads,$n50,$longest_contig,$total);
# Count the number of contigs found.
open(FH,"$working_dir/$OUTPUT_DIR/contigs.fa") || die $!;
$nb_contigs = 0;
while(<FH>) {
next unless (/^ID/);
$nb_contigs++;
}
close FH;
# Get the last line of the Log, we need it for a bunch of things. It looks like:
# Final graph has 16 nodes and n50 of 24184, max 44966, total 100080, using 135855/142858 reads
open(FH,"$working_dir/$OUTPUT_DIR/Log");
while(<FH>) {
$line = $_;
}
close FH;
($n50,$longest_contig,$total,$nb_assembled_reads) = ($line =~ /n50 of\s*(\d+).*max\s*(\d+).*total\s*(\d+).*using\s*(\d+)/);
# Write stats out.
mkdir "$working_dir/$STATS_DIR";
open(STATS,">$working_dir/$STATS_DIR/$STATS_FILE") || die $!;
print STATS "[params]\n";
print STATS "input=$PARAMS->{params}->{qdb}\n";
print STATS "hash_length=$PARAMS->{params}->{hash_length}\n";
print STATS "dataset_type=$PARAMS->{params}->{dataset_type}\n";
print STATS "result_name=$PARAMS->{params}->{title}\n";
print STATS "min_coverage=$PARAMS->{params}->{min_coverage}\n";
print STATS "max_coverage=$PARAMS->{params}->{max_coverage}\n";
print STATS "workflow_id=$PARAMS->{dbms_workflow_id}\n";
print STATS "[general_stats]\n";
print STATS "nb_reads=$READ_COUNT\n";
print STATS "nb_residues=$RESIDUE_COUNT\n";
print STATS "max_read_len=$MAX_READ_LENGTH\n";
print STATS "nb_contigs=$nb_contigs\n";
print STATS "nb_assembled_reads=$nb_assembled_reads\n";
print STATS "n50=$n50\n";
print STATS "longest_contig=$longest_contig\n";
print STATS "average_contig=", $total/$nb_contigs, "\n";
close STATS;
open(NBCONTIGS,">$working_dir/$STATS_DIR/nb_contigs.txt") || die $!;
print NBCONTIGS "$nb_contigs\n";
close NBCONTIGS;
}
Later on we'll use this data to create a report.
Testing Your Code
You can test at any time just by launching a new workflow via the GUI and watching what happens. After a few bugs, we got the software to work. Take a look:
There's just two problems. One, although we confirm that 13 contigs were generated, the image shows that the system is reporting empty on the Total Number of Results. And two, if we click the link on our successful workflow, we get a horrible error. This is because we haven't gotten around to creating the report yet.
Nevertheless, the underlying workflow worked great. If we go take a look at the $working_dir for this workflow:
% ls -ls total 8764 4 -rw-r--r-- 1 runner geneit 190 Feb 11 10:15 body.script.comparison.sh 0 -rw-r--r-- 1 runner geneit 0 Feb 11 10:15 comparison.has.finished 12 -rw-r--r-- 1 runner geneit 8504 Feb 11 10:15 comparison.log 8700 -rw-r--r-- 1 runner geneit 8888944 Feb 11 10:15 input.fa 4 drwxr-xr-x 2 runner geneit 4096 Feb 11 10:15 output 4 -rw-r--r-- 1 runner geneit 6 Feb 11 10:15 overall.progress 4 -rw-r--r-- 1 runner geneit 640 Feb 11 10:15 query.db.ind 8 -rw-r--r-- 1 runner geneit 4586 Feb 11 10:15 script.comparison.sh 8 -rw-r--r-- 1 runner geneit 6405 Feb 11 10:15 script.comparison.sh.params 8 -rw-r--r-- 1 runner geneit 4901 Feb 11 10:15 script.comparison.sh.params.json 4 -rw-r--r-- 1 runner geneit 5 Feb 11 10:15 script.comparison.sh.pid 0 -rw-r--r-- 1 runner geneit 0 Feb 11 10:15 script.comparison.sh.stderr 4 -rw-r--r-- 1 runner geneit 60 Feb 11 10:15 script.comparison.sh.stdout 4 drwxr-xr-x 2 runner geneit 4096 Feb 11 10:15 Stats
Looks nice! Not only do we have the original GQ Framework files, but our workflow seems to have done its work. The file comparison.has.finished indicates that the processing is over and that there weren't any non-zero exit codes. And the run_velvet.pl code itself created a Stats directory. Inside that are:
% ls -ls Stats/ total 8 4 -rw-r--r-- 1 runner geneit 3 Feb 11 10:15 nb_contigs.txt 4 -rw-r--r-- 1 runner geneit 321 Feb 11 10:15 Statistics.ini % cat Stats/nb_contigs.txt 13 % cat Stats/Statistics.ini [params] input=id:87 hash_length=21 dataset_type=short result_name=Test of Velvet With Complete Workflow min_coverage= max_coverage= workflow_id=1785 [general_stats] nb_reads=142858 nb_residues=5000030 max_read_len=35 nb_contigs=13 nb_assembled_reads=135855 n50=24184 longest_contig=44966 average_contig=7698.46153846154
So we are ready to start to create the report.
Creating a Progress Bar
We touched on this earlier - as a workflow goes through its various processing steps, it's useful for the user to know how much is completed and how much still needs to be done. Use a progress bar for this purpose. This is an example of a GenomeQuest progress bar:
As we said, if you are implementing your workflow all inside of your BASH template script.comparison.body.tpl you have access to an update_progress function that does this for you.
If you are writing your workflow outside of the BASH calling program, you need to know how to do this yourself.
Overall Progress
In your $working_dir, GenomeQuest automatically checks for a file called overall.progress. This file should have one line in it:
% cat overall.progress [] 0
This tells the GenomeQuest user interface that you are 0% done.
To increment it whenever you like, simply overwrite it with a new value from anywhere in your workflow, e.g.,:
[] 47
says that your workflow is 47% done.
You can use the update_progress function to do this automatically if your workflow is written inside your BASH template, script.comparison.body.tpl, e.g.:
-
update_progress 47
Indeed, in our example workflow for Velvet, we re-implemented this function inside of our PERL library to do the same thing. Feel free to steal that.
Sub-progress
Often your workflows will use the GenomeQuest Engine to perform a very large computation. Typically this will come in the form of using the GQ Engine command line command lspcalc.THA, which distributes massive sequence comparisons on the compute array. This command has an option called -progress which will output its own progress information in a particular format in a subdirectory. For instance:
lspcalc.THA -extend -splitrole query -qs SGE -qn $SGE_QUEUE -jobname $JOBNAME -- \
-M mapnc,kerr -O "[ -hitmap -r -extend -errs $NBERRS -fltThreshold $FLTTHRESHOLD ]" \
-db $SBJCT -bfql "$BFQL" -db $QUERY -o allres \
-best '1,single,hitcnt,rmdup,{RS}' $ADDPARAMS -progress 1>>lspcalc.progress 2>> lspcalc.stderr &
The progress of this command is being accumulated into the file lspcalc.progress in the current directory. You may already be at 30% progress when you launch this command, and you want the command itself to run the progress bar from 30% to 60%. The way to do this is as follows:
If you are inside of your BASH script.comparison.body.tpl
-
update_progress "30 60 lspcalc.progress"
If you are outside of your BASH <code>script.comparison.body.tpl, for instance, in PERL or somewhere else, you need to overwrite the overall.progress file as follows:
[60] 30 lspcalc.progress
In this example, the progress bar shows that the workflow is between 30% complete and 60% complete. Because the step uses the GenomeQuest engine for processing, GenomeQuest automatically reads the progress file, calculates the percentage complete for the step, calculates the percentage complete for the workflow, and increments the progress bar.
Creating The Report Page
Just as with the launch page, where we had to built an HTML template and then use our master PHP class to fill it in, we'll do the same with our report. Whereas with the launch page we had to implement the function action_getLaunchPage, with the report we'll have to implement a function called action_getReport.
There are four steps to make this work:
- implement the PHP function
getOutputsto describe the various outputs of this workflow. For instance, the velvet workflow will define a "report" output and a sequence database of contigs that Velvet assembled. (You'll recall we already used admindb.pl to load this database into the system). - implement the PHP function
getDefaultOutputNameto indicate, of all of the outputs you defined ingetOutputs, which of them is the default - implement the PHP function
action_getReportto "fill in" a report.tpl template. - create a Smarty template - report.tpl - to specify the presentation of the report to the user.
Let's get started.
getOutputs
Our workflow will have two outputs. The report itself, and the GenomeQuest Engine sequence database of contigs that we produced. So in our GqWfVelvetExample.php master class, we define getOutputs as follows:
public function getOutputs() {
$server_url = $this->getEnvironment('server_url');
$workflow_id = $this->getEnvironment('dbms_workflow_id');
return array(
'report' => array('name' => 'Report',
'description' => 'Statistics of the workflow',
'filename' => $server_url . '/query?do=gqplugin&plugin=Workflows.'.__CLASS__.'&plugin_action=getReport&plugin_instance_id=' . $workflow_id,
'mimetype' => GqMime::TYPE_URL),
'contigs' => array('name' => 'Contigs',
'display' => 'show',
'description' => 'Contigs',
'filename' => $server_url . '/query?do=gqfetch&db=LOCAL_VELVET_' . $workflow_id . '_CONTIGS',
'mimetype' => GqMime::TYPE_URL),
);
}
The first output we define is the report itself. It has a mimetype of GqMime::TYPE_URL which means we need to provide a URL. The URL we've provided is the standard URL you should use to get the report. Indeed, you can see from the URL:
$server_url . '/query?do=gqplugin&plugin=Workflows.'.__CLASS__.'&plugin_action=getReport&plugin_instance_id=' . $workflow_id
that we are using the URL API function gqplugin on our class (_CLASS_ will interpolate to GqWfVelvetExample) and requesting the action getReport on this instance of the workflow. Indeed, the PHP function getReport is exactly what we'll need to define next - it'll fill in a template and display the report.
So now GQ knows that there is an output called report. Indeed, you may remember that one of the other required functions to implement in GqWfVelvetExample.php is called getDefaultOutputName. Remember how we did that?
The second output we defined was a Sequence Database that we already loaded into the system during the workflow's execution. To render Sequence Databases, we use the Sequence Database browser, which is accessible via the URL API command gqfetch:
-
$server_url . '/query?do=gqfetch&db=LOCAL_VELVET_' . $workflow_id . '_CONTIGS'
We know that the database name is something like LOCAL_VELVET_XXX_CONTIGS because we set it up this way in our workflow. Check back at how we made the insertion of this database in the workflow step to see.
Generally Speaking
Each output you define must have a unique name within this workflow type, and 5 properties: filename, mimetype, name, description, and display. Specifying the types for each of the outputs allows GenomeQuest to render them properly.
filename
This is the file name of the output(including the path relative to the working directory). If the mimetype is GqMime::TYPE_URL (see below), the filename is the URL.
mimetype
GqMime::TYPE_RESDB for a GQ Engine Result database
GqMime::TYPE_URL for URL redirect.
GqMime::TYPE_HTML an HTML document.
GqMime::TYPE_MSWORD for an MS Word document.
GqMime::TYPE_MSEXCEL for an MS Excel document.
GqMime::TYPE_TEXT for a text file to download.
Remember, if you want to display a Sequence Database, use GqMime::TYPE_URL and use the query?do=gqfetch syntax as we did above.
name
This is a short string that describes this output.
description
This is a long description of this output.
display
If this is set to 'hide,' then that particular result isn't displayed in the workflow result list page. If this is set to 'show,' then that particular result is displayed and clickable.
getDefaultOutputName
Now that we've defined all possible outputs, let's set up the default output:
public function getDefaultOutputName() {
return 'report';
}
That string 'report' exactly matches the key in our array in getOutputs. So the GenomeQuest framework already knows that the default output of this workflow is a report, and it knows how to find it because we told it how to via that URL in getOutputs.
action_getReport
Next, we have to implement the action_getReport function to fill in a Smarty template that we haven't yet developed.
public function action_getReport(array $params) {
// Add the stats that we created during the workflow into our environment
$statsFileFullpath = $this->getEnvironment('working_dir').'/Stats/Statistics.ini';
if (! is_readable($statsFileFullpath)) {
throw new GqException("Workflow output file '$statsFileFullpath' is not valid or is unreadable. Please check your workflow to make sure it's been created.");
}
$this->addEnvironment('velvet_stats',new Zend_Config_Ini($statsFileFullpath));
//Set report title
$this->setTemplateTitle(' ' . $this->aDbmsWorkflow->getTextLabel() . ' - Report');
echo $this->fillDisplayTemplate(dirname(__FILE__).'/report.tpl');
}
First, we want to get that Statistics.ini file that our Velvet workflow generated and pass it through to our report. So the first few lines load this file and then add it to the Smarty template's environment. Notice how you can throw exceptions when things don't work out. This will display nicely for the user.
Next, we set the title of the report page (the part of the web page above the URL, at the title bar of the browser) to something that makes sense. Again we find ourselves using the $this->aDbmsWorkflow member of our class.
Finally, we fill in the template defined in report.tpl. So let's go build that file now.
report.tpl
The final step is to specify the general presentation of the report. It's HTML again, combined with Smarty. You can see the entire report.tpl file here, but let's review a couple of things.
{* Breadcrumbs for the top of the page *}
<table>
<tr>
<td id="breadcrumb">
<img src="img/grey_box_bullet.png" style="margin-bottom:1px">
{if $current_user.gq_version_preference >= 6}
<a title="Back to My Workflows" href="query?do=mygq#gqworkflows.GqWfVelvet">My Velvet Workflows</a>
{else}
<a title="Back to My Results" href="resultmanagement">My Results</a>
{/if}
>
{* Later on we're going to build an info.tpl file which will be used to show these results *}
<a href="javascript: void(0);" onClick="dijit.byId('resultinfo').show();" title="Click here to have detailed information about this run
">{$instance_info.text_label}</a>
(<a href="query?do=gqplugin&plugin=Workflows.{$plugin_def_id}" title="Click here to launch a new Velvet Assembly">Launch new</a>)
</td>
</tr>
</table>
This isn't the very top of the template, but it shows the "breadcrumbs" - good practice to give folks a way back from where they came. Also note here that we are enabling a "show" of the resultinfo DIV which will be defined below - this DIV will provide more information from the run via the info.tpl template - something we'll cover below.
Accessing The Statistics You Produced in Statistics.ini
{*
Notice the use of Smarty's mathematical operations in the below section to use the data that's been
provided by the framework to compute for the report
*}
<fieldset class="section" id="StatisticsSection">
<legend>Statistics</legend>
<br />
{if $velvet_stats->general_stats->nb_reads != 0}
{math assign="perc_assembled_reads" equation="x*100/y" x=$velvet_stats->general_stats->nb_assembled_reads y=$velvet_stats->general_stats->nb
_reads format="%.2f"}
{math assign="perc_unassembled_reads" equation="100 - (x*100/y)" x=$velvet_stats->general_stats->nb_assembled_reads y=$velvet_stats->general
_stats->nb_reads format="%.2f"}
<ul>
<li>total number of sequences: {$velvet_stats->general_stats->nb_reads|number_format} ({$velvet_stats->general_stats->nb_residues|number_fo
rmat} bp)</li>
<li>total number of contigs: {$velvet_stats->general_stats->nb_contigs|number_format}</li>
<li>total number of assembled reads: {$velvet_stats->general_stats->nb_assembled_reads|number_format} ({$perc_assembled_reads}%)</li>
<li>average contig size: {$velvet_stats->general_stats->average_contig|number_format}</li>
<li>longest contig size: {$velvet_stats->general_stats->longest_contig|number_format}</li>
<li>n50 contig size: {$velvet_stats->general_stats->n50|number_format}</li>
</ul>
<br />
Here we are presenting certain statistics from the workflow. Notice how we access the statistical data, through a variable called $velvet_stats. Recall, we passed this through to the template in the action_getReport() function, and that function got this data by parsing the Statistics.ini file that our workflow created. Again, that file can be seen here:
[params] input=id:87 hash_length=21 dataset_type=short result_name=Test of Velvet With Complete Workflow min_coverage= max_coverage= workflow_id=1785 [general_stats] nb_reads=142858 nb_residues=5000030 max_read_len=35 nb_contigs=13 nb_assembled_reads=135855 n50=24184 longest_contig=44966 average_contig=7698.46153846154
So in our report.tpl, we can get access to stats in the params section via:
$velvet_stats->params->[your_param_of_interest]
and from the general_stats section via:
$velvet_stats->general_stats->[your_param_of_interest]
Accessing Other Outputs in the Report
Next, let's take a look at how we show other outputs of the workflow, for instance, the
{if $velvet_stats->general_stats->nb_contigs != 0}
<a target="_blank" href="{$server_url}/query?do=gqworkflow.show_result&workflow=id:{$instance_id}&workflow_output_name=contigs" title="Brow
se the database of {$velvet_stats->general_stats->nb_contigs} contigs">Browse database of {$velvet_stats->general_stats->nb_contigs} contigs<
/a><br /><br />
This database is also available in the <a target="_blank" href="{$server_url}/query?do=mygq#gq_all_physical_dbs">Other Sequence Databases/A
ll Physical</a> section of your My GenomeQuest page.<br />
Here we are providing access to the database of contigs right in the report. Remember, we made two outputs in the getOutputs() function of our master class. One was for this report, and the other for the database of contigs, which we called "contigs".
Notice that the URL to this output is structured as "&workflow_output_name=contigs" - directly referring to the key we supplied in getOutputs().
Using Charts in Reports
Google's Chart API is great for this stuff:
{*
Here you'll see some more fancy Smarty computations as well as a call out to Google's Chart
API for a nice pie chart of the total number of assembled sequences.
*}
<fieldset class="section" id="SecondStatisticsSection">
<legend>Assembled sequences</legend>
Percentage of sequences assembled by Velvet.<br />
{if $perc_assembled_reads > 0}
{assign var="assembled_chl" value="|Assembled ("|cat:$perc_assembled_reads|cat:"%)"}
{/if}
<center><img height="150" src="http://chart.apis.google.com/chart?{$chco}&cht=p3&chd=t:{$perc_unassembled_reads},{$perc_assembled_reads}&c
hs=650x150&chl=Unassembled ({$perc_unassembled_reads}%){$assembled_chl}"/></center>
<br />
</fieldset>
We leave the docs on their API to them.
Summarizing
Now that all of this is in place, our report should work. Let's take a look!
Looks perfect. The links all work too.
Creating The Info Page
There's still a problem though. As a user, I'm looking through my recent VelvetExample runs and I see one that I want more information on. I click on it in the table and get this error:
That's not so good. We need to define an info.tpl template that tells GenomeQuest how to render the information available in this workflow run.
Generally, there's a standard file you can use. Here it is:
Standard info.tpl with Workflow Parameters
{if $export}{assign var='colspan' value='colspan="2"'}{else}{assign var=table_width value='width="750"'}{/if}
{strip}
<table {$table_width} {if !$noClass}class="rs_resinfo"{/if}>
{* include some basic info for the current workflow *}
{include file="$common_data_dir/info.tpl"}
<tr>
<td class="rs_resinfo_title" {$colspan}>Workflow Parameters</td>
<td class="rs_resinfo_content"> <table>
{foreach from=$params item=v key=k}
<tr><td>{$k}</td><td>{$v}</td></tr>
{/foreach}
</table>
</td>
</tr>
</table>
{/strip}
This code, placed in the info.tpl file in the same directory as the rest of your workflow, is all you need to change the picture to something like this:
Standard info.tpl without Workflow Parameters
If you are satisfied with both the default information and the default layout, the info.tpl file in your class only needs to include the standard info.tpl file in the parent class. Use this to include the info.tpl file in the parent class:
{if $export}{assign var='colspan' value='colspan="2"'}{else}{assign
var=table_width value='width="750"'}{/if}
{strip}
<table {$table_width} {if !$noClass}class="rs_resinfo"{/if}>
{* include some basic info for the current workflow *}
{include file="$common_data_dir/info.tpl"}
The URL API and info.tpl
Of course, since everything is accessible via the URL API, the gqplugin URL API function allows you to ask for the plugin_action of getInfo. Use this URL to display the information that you have defined in the info.tpl file:
query?do=gqplugin&plugin=Workflows.[GqWfSomething]&plugin_action=getInfo&plugin_instance_id=[Workflow_id]
[GqWfSomething] is the class name.
[Workflow_id] is assigned by GenomeQuest when launching the workflow.
This can be called from anywhere in the application - or out of the application! - including from a report.
Setting the Number of Results
There's one more problem with this workflow. We found 13 contigs, but the table on the My GenomeQuest page says that the number of results is "empty". That's just not true. Let's fix this.
$this->finishRun()
There is a function called finishRun() which is automatically called by the workflow framework when the run is over. What we'll do is override this function and set the number of results ourselves. So, in our PHP class:
protected function finishRun() {
// update the nb result count
$path = $this->getEnvironment('working_dir') . '/Stats/nb_contigs.txt';
if (file_exists($path)) {
$nbcontigs = file_get_contents($path);
}
if (is_numeric($nbcontigs)) {
$nbcontigs = (int) $nbcontigs;
}
$this->aDbmsWorkflow->setTotalNbResults($nbcontigs);
parent::finishRun();
}
We open up the file that our Perl-based workflow produced in the Stats directory called nb_contigs.txt. By the way, this file looks like this:
% cat Stats/nb_contigs.txt 13
Then, we load that value in, and use the $this->aDbmsWorkflow member to set the number of results.
Finally, we call the parent's finishRun(); so whatever else that function needs to do gets done.
And voila!
Summary and Source Code
You've seen the whole story. From an empty page to a completed workflow. Here's a link to the source code, including the velvet binaries, if you want to take this as a starting point and go on from here.
Remember, you can always contact us at support@genomequest.com for any problems or questions with our APIs.
Reference
Data Structures
While handling a web request, GenomeQuest uses two data structures to store information about the workflow and its environment. During processing, the workflow script and GenomeQuest both store data in the data structures. The data is then retrieved for use by the templates.
The "this" Data Structure
In the context of a workflow, $this is a ScriptsWorkflowAbstract.
This table shows many of the functions, and objects, that can be accessed inside the main workflow PHP class (e.g. GqWf*.php), with the "$this" notation. (The "this" object refers to the current class, along with every class that it inherits from, and every class that inherits from it.)
| FUNCTION | WHAT IT DOES |
|---|---|
| aDbmsWorkflow | An object (data structure) that contains information about a workflow run, such as the title and the input parameters. This is used after the workflow finishes. |
| addAllEnvironment | Adds an array to the Environment data structure. |
| addEnvironment | Adds a parameter to the Environment data structure. |
| checkLaunchParams | Checks that the search parameters entered by the user are valid. |
| checkSeqdbInputs | Checks that the search parameters entered by the user are valid. |
| deleteAuxiliaryData | When deleting a workflow, deletes workflow data that is outside the workingDir. (Data in the workingDir is deleted automatically.) Although GenomeQuest does not recommend producing data outside of the working directory, you can call this when deleting a workflow and include logic to delete the data. |
| fillDisplayTemplate | Pushes the data from the Environment to the template. Can be used with the launch page template, the report template, or any other HTML template required by the workflow. Can not be used with the script template because it often adds extra HTML for a header or footer. The HTML templates use curly brackets for variables: {variable} |
| fillScriptTemplate | Pushes the data from the Environment to the template. Can be used with the script template. It adds some predefined functions, such as update_progress, and therefore cannot be used with the launch page template, the report template, or any other HTML template required by the workflow. The curly brackets used for variables in the HTML templates often have another use in shell scripts, so use double square brackets for variables: [[$variable]] |
| finishRun | Specifies any processing that is required after the workflow ends, such as sending email. |
| generateScriptBody | The main function that a workflow needs to implement. Returns the final bash script. GenomeQuest also adds some additional processing before and after this bash script to do various housekeeping tasks. |
| getAllEnvironment | Gets an array of all the parameters in the Environment data structure. |
| getConfig | Gets the value of a parameter defined in the plugin_config.txt file. |
| getCurrentVersion | Gets the current version number of the workflow. |
| getDefaultOutputName | Returns the output type that should be used as the default. |
| getDescription | Gets the description that is in the plugin_config.txt file. |
| getEnvironment | Gets a parameter from the Environment data structure. |
| getLogFileName | Gets the name of the log file from the Environment data structure. The value is immutable. |
| getOutputs | Specifies the types of output that the workflow creates so that they can be rendered properly. |
| getStatus | Gets the status of processing, such as STATUS_RUNNING, STATUS_FINISHED, or STATUS_FAILED. |
| getStepProgress | Parses the progress file to return the correct percentage complete for one step of the workflow. |
| getTextLabel | Gets the name of the workflow run. |
| getTitle | Gets the title from the configuration file, plugin_config.txt. |
| getWorkflowResultObj | Gets the aDbmsWorkflow object for this workflow. |
| getWorkflowStatus | Gets the status of processing, such as STATUS_RUNNING, STATUS_FINISHED, or STATUS_FAILED. |
| getWorkflowType | Gets the name of the workflow type. For example, "GqWfSeqSearch" or "GqWfDigix" |
| requireQuota | Used internally by GenomeQuest for billing. Unless a workflow will be billable on GenomeQuest Live, put this line in every workflow:
public function requireQuota() {return FALSE;} |
| setTemplateTitle | Creates a title to appear at the top of the browser. |
| translateFilterExpression | Filters a subset of existing workflow results. |
| translateFilterExpressionBinOp | Filters a subset of existing workflow results. |
| translateFilterExpressionCom | Filters a subset of existing workflow results. |
| workingDir | The path to the working directory for this workflow. GenomeQuest assigns the directory and creates files in it. |
The Environment Data Structure
The environment data structure is inherited by ScriptsWorkflowAbstract through its parent, PluginAbstract::$environment.
The Environment data structure holds general system-wide information and is used while the workflow is running. The data in the Environment data structure is input for the workflow. The data is used to the launch page template, the script template, and any other HTML templates that the workflow requires while it is running. Some parameters are added automatically, such as the paths to various binary files, and information about the current user. You can also add any additional information that is required by the workflow. Data in the Environment data structure does not persist after the workflow finishes.
All of this data is written out into the workflow's $working_dir in a file called script.comparison.body.params and in JSON format in script.comparison.body.params.json.
You move data from your master PHP class into the various Smarty templates via one of two functions:
$this->fillDisplayTemplate(<full-path-to-template>)
Sends data to templates designed with HTML such as the submit.tpl or the report.tpl
$this->fillScriptTemplate(<full-path-to-template>)
Sends data to templates designed with BASH such as the script.comparison.body.tpl
The entire environment is sent as a data structure to these templates. To add to the environment, use the following function:
$this->addEnvironment(<key>,<val>)
Where key/val is any key-value pair you want to add to the environment.
To get parameters from the Environment data structure, use getEnvironment or getAllEnvironment.
$val = $this->getEnvironment(<key>)
Where key is the key of the environment you wish to see the $val for.
$array = $this->getAllEnvironment()
Returns a PHP array with the entire environment.
The aDbmsWorkflow Data Structure
The aDbmsWorkflow data structure holds information for a specific instance of the workflow and is used after the workflow finishes. Think of this as the row in the metadata database that describes the high level metadata about a workflow run.
The data is typically passed to the report template after the workflow finishes. GenomeQuest automatically adds parameters to the aDbmsWorkflow data structure, such as how many results the workflow generated, and the parameters that the user specified on the launch page. You are not able to add any additional information to the aDbmsWorkflow data structure. Data in the aDbmsWorkflow data structure persists after the workflow finishes.
This table shows the functions that can be used with the aDbmsWorkflow object.
| FUNCTION | WHAT IT DOES | |
|---|---|---|
| getDbBfql | Returns the filtering parameter for either the query or subject input sequence databases | |
| getDescription | Returns the description of this workflow run | |
| getEmail | Returns the email addresses associated with this workflow run(for sending notification emails) | |
| getFullPath | Returns the full path to the result file. The owner ID of this workflow determines the location | |
| getFullPathQueryDb | Returns the full path to the query database of the workflow | |
| getFullPathSubjectDb | Returns the full path to the subject database of the workflow | |
| getId | Returns the workflow ID | |
| getOwner | Returns the owner object of this workflow | |
| getOwnerObject | Returns the owner object of this workflow | |
| getTextLabel | Returns the name of the workflow run | |
| getTotalNbResults | Returns the number of results for all query sequences in the workflow | |
| getVersion | Returns the version of this workflow run | |
| setDescription | Sets the description of this workflow run | |
| setEmail | Sets the email addresses associated with this workflow run (for sending notification emails) | |
| setTextLabel | Sets the name of the workflow run | |
| setTotalNbResults | Sets the number of results for all query sequences in the workflow | |
| setVersion | Sets the version of this workflow run | |
| toTemplateData | Dumps essential information about this workflow run into an array. | |
| update | Saves this workflow record into the database |
Gq-Mime Types
The available GqMime types are shown in the table below.
| Output Type | GqMime Type |
|---|---|
| Alignment database | TYPE_RESDB |
| Excel spreadsheet | TYPE_MSEXCEL |
| Word document | TYPE_MSWORD |
| Static HTML page | TYPE_HTML |
| Text file for download | TYPE_TEXT |
| URL redirect (including a redirect to a report) | TYPE_URL |
Creating Links to Sequence Databases
A basic link to a sequence database has this form:
[server URL]/query?do=gqfetch&db=[Definition ID]
[Server URL] and [Definition ID] are required. Definition ID is assigned with the Content Manager. Many other options are also available, such as specifying a subset of the output to display or specifying specific columns to display. For more information, see the URL API call query?do=gqfetch.help.
Creating Links to Other Types of Output
Links to most types of output all have the same format. This includes links to alignment databases, Excel spreadsheets, Word documents, HTML pages, reports, and other types of URL redirects.
A link has this form:
[server URL]/query?do=gqworkflow.show_result&workflow=id:[ID]&workflow_output_name=[output_name]
[ID] is the workflow ID assigned by GenomeQuest when launching the workflow. Each workflow run is assigned a different ID, which is accessible inside a template as $dbms_workflow_id.
[output_name] is the name that you assigned in the getOutputs function.
Reserved Words
These names are reserved, should not be used by any workflow as parameter names.
- plugin
- plugin_instance_id
- format
- template
- username
- password
- apitokenttl
- apitoken.
Special Words
The following parameters have special meaning when used in the launch page for the workflow. GenomeQuest performs special processing on them. They should be used only in this context. They should not be used as a parameter name by the workflow for any other purpose.
- description
- parent_id
- qdb
- qdb_def_id
- qdb_file
- qdb_id
- qdb_seq
- qdb_seq_format
- qdb_seq_type
- qdb_upload
- sdb
- sdb_def_id
- sdb_file
- sdb_id
- sdb_seq
- sdb_seq_format
- sdb_seq_type
- sdb_upload
- title
- workflow_params
- workflow_type
URL API
GenomeQuest provides special handling so that PHP functions in your PHP class can be invoked directly from the URL. Some specific functions are provided by GenomeQuest. You can create additional ones if you need them. The functions provided by GenomeQuest are:
- plugin_action=getLaunchPage. This function triggers the appropriate PHP file to display the launch page for a workflow. For more information, see Creating the Launch Page.
- plugin_action=getReport. This function runs action_getReport, which collects data for a report. Use this with getOutputs to display a report. For more information, see Creating the Report Page.
- plugin_action=getInfo. This function runs action_getInfo, which gets the data specified for the info.tpl file. For more information, see Creating the Info Page.
Any function you write in your PHP class, provided it is defined as a public function and starts with the tag action_, can be called via the URL API.
How the Various Files Interact During Processing
This table shows how the various files interact during processing. The first column shows actions done by the user. The next column shows processing done by the PHP file. The next column shows processing done by the various HTML template files. The next column shows processing done by the script template. The last column shows processing done by GenomeQuest. Reading from top to bottom, it is chronological - starting with the user clicking a link to launch the workflow.
| USER | PHP FILE | HTML TEMPLATE FILES | SCRIPT TEMPLATE FILE | SUPPLIED BY GENOMEQUEST | |
|---|---|---|---|---|---|
| 1 | User clicks a link for a URL that contains:
query?do=gqplugin&plugin=Workflows. GqWf<something>, which triggers the appropriate PHP file for the workflow |
||||
| 2 | Server receives the request, and runs GqWf<something>::action_getLaunchPage | ||||
| 3 | Gets whatever data is specified for the launch page | ||||
| 4 | Runs fillDisplayTemplate and pushes the data to the launch page template (which is typically named submit.tpl) | ||||
| 5 | Displays the launch page to the user with HTML from the launch page template and the merged data | ||||
| 6 | User enters search parameters and clicks the Submit button | ||||
| 7 | Calls checkLaunchParams and checkSeqdbInputs to check that the search parameters entered by the user are valid | ||||
| 8 | If the search parameters are valid, calls generateScriptBody. If the search parameters are not valid, displays an error message. | ||||
| 9 | Gets whatever data is specified for the script template | ||||
| 10 | Runs fillScriptTemplate and pushes the data to the script template (which is typically named script.comparison.body.tpl) | ||||
| 11 | Does preliminary processing, such as preparing the sequence databases requested by the user | ||||
| 12 | Merges the script template with the data to produce the main workflow script, which does whatever processing the workflow requires. Also runs the progress bar to show how much of the processing is completed, and generates statistics if any are specified. | ||||
| 13 | Kicks off the script and puts it in the background. Redirects user back to My GenomeQuest page where the progress bar is shown. Does final processing. Additional workflow-specific processing can also be added to the PHP file. | ||||
| 14 | User clicks a default link to show the launch parameters (such as the Detail tab in the results summary) | ||||
| 15 | URL contains plugin_action=getInfo. Runs action_getInfo & displays the default information about the launch parameters, in the default location, with the default format. The default info.tpl file must also be included in the default location. | ||||
| 16 | User clicks the link for the workflow run | ||||
| 17 | Runs getDefaultOutputName. If the default output is a report (as is usual), the URL specified in the PHP file contains plugin_action=getReport | ||||
| 18 | Runs action_getReport | ||||
| 19 | Gets whatever data is specified for the report | ||||
| 20 | Runs fillDisplayTemplate and pushes the data to the report template (which is typically named report.tpl) | ||||
| 21 | Displays the HTML and merged data to the user, typically this contains statistics and links to other output | ||||
| 22 | User clicks a link on the report to some other output | ||||
| 23 | Runs getOutputs | ||||
| 24 | Displays the other output to the user | ||||
| 25 | User clicks any workflow-specific link that shows the launch parameters, URL contains plugin_action=getInfo | ||||
| 26 | Runs action_getInfo | ||||
| 27 | Gets whatever data is specified | ||||
| 28 | Runs fillDisplayTemplate and pushes the data to the info template (which is typically named info.tpl) |
| |||
| 29 | Displays the HTML and merged data to the user |
|
More Docs
Now that you've read the Workflow docs, it's probably time to learn more about the GQ Engine...
- Go Review: Review the overall System Concepts for the entire GQ platform, from command line to web
- Go Deep: Learn about the GQ Engine with the GQ Engine Primer.
- Go Deeper: Learn more about BFQL with the BFQL Primer.
- Go Under: Read the BFQL Reference Manual: File:BFQLReference.pdf
As always, we're here to help. Reach out to support@genomequest.com for any question, any time.



