Welcome to the GenomeQuest Documentation Wiki

URL API

From GQ Wiki
Jump to: navigation, search

Contents

Introduction

This layer provides calls formed with URLs that can be used three ways:

  1. Directly from the navigation toolbar of a web browser. GQ URLs allow changing GQ GUI default’s display and output stream.
  2. From external Web applications. GQ URLs can be inserted with simple, single-link call. This allows quickly connecting in-house application from / to the GQ platform.
  3. As a more advanced use, URLs can be used in a programmatic way, using regular programming languages.

The GQ URL API obeys the standard type of operations. It uses HTTP-encoded requests to GenomeQuest Web server. It is very similar in its principle and use, to other approaches such as the NCBI one . However, while it can perform the same types of operations, the GQ URL API provides extended capabilities compared to legacy tools.

The general form of an URL is:

http://www.gqserver.com/query?
      do=<operation> &
      param1=value1 &
      param2=value2

do= is the main operation dispatcher. Four major operations are provided: gqfetch, gqworkflow, gqresult, gqplugin. A significant number of other operations are available, however these are the more common operations.

Help Online

The URL API's documentation is built into the GenomeQuest system. Try:

https://my.genomequest.com/query?do=gqhelp

Each of the major dispatchers listed there (e.g., gqfetch, gqresult, etc.) have their own help systems:

https://my.genomequest.com/query?do=gqfetch.help
https://my.genomequest.com/query?do=gqresult.help

Use Cases

Get a Token

Log into the system programatically.

[Request]
http://server/query?do=gquser.get_token&username=[username]&password=[password]&apitokenttl=[3600]
[Response]
token = 4:oenfcwuWe09e
token_creation_date = server Unix timestamp for token’s creation time
token_expiration_date = server Unix timestamp for token’s expiration time
expired = 0 | 1  (0 for not expired, 1 for expired) 

Fetch a Sequence Database

Assumes you already have an API token or are logged in via a browser.

[Request]
http://server/query?do=gqfetch&db=GB_VRL&template=printf:%H%23ID\n&apitoken=xxx

Please note '%23' in above URL is '#' character in encoded form. template=printf:%H#ID\n specifies the data to be returned, in this case '%H#ID\n' means return all ids of sequences in Genbank Viral division, separated by carriage return. By default it only returns first 50, you can add &start=x&stop=y to control how many records to return.

Run an IP Sequence Search

Request

http://server/query?do=gqworkflow &
workflow_type = GqWfIpSearch &
qdb_seq_type = nucleotide|protein &    // query sequence type
qdb_seq = ATGCATGC &                   // actual query sequence
sdb_def_id = GQPAT_NUC,GQPAT_PRT &     // the subject sequence databases to search against, comma separated list. Or PROTDBS=GQPAT_PRT&NUCDBS=GQPAT_NUC
strat_name = blast|kerr|fragment &     // search strategy
best_hit_keep_max = 500 &              // number of results to keep
title=my ip search &                   // run name
email=joe@example.com &                // the email address for when run is finished
seqlenrange_low=6 &                    // the length of shortest sequence to search inside subject db
seqlenrange_high=100000                // the length of longest sequence to search inside subject db 
... (other strategy specific params, see below)

Some parameter details

strat_name

blast:

strat_blast_best_hit_definition = EVAL|SCORE &  // whether the best hit is defined as lowest EVAL or highest score
strat_blast_word_size_nuc = 11 &                // word size. Or strat_blast_word_size_pro = 5 in case there is protein qdb or sdb involved (although the strat_name is still blast, no need to say blastx)
strat_blast_scoring_matrix_nuc=NUC.3.1 &        // matrix. Or strat_blast_scoring_matrix_pro=BLOSUM62 in case of protein
strat_blast_eval_cutoff = 10 &                  // e value cutoff

kerr:

strat_genepast_perc_id = 80 & // percent identity cutoff
strat_genepast_perc_id_over = QUERY|SUBJECT|SHORTER & // percent identity over query or subject or shorter of both.

qdb_seq

If single sequence, just supply the sequence bases. For example TTTTAAAA

If multiple sequence, you can use fasta format. For example

>seq_1
GCTAGCTAGCTA
>seq_2
CGATCGATGCTAGT

If your request is done via GET, please make sure to URL encode the fasta sequence to preserve the line breaks.

Run a Sequence Search

[Request]
http://server/query?do=gqworkflow &
workflow_type = GqWfSeqSearch &
qdb_seq_type = nucleotide|protein &
qdb_seq = ATGCATGC &
sdb_def_id = GB_PRI &
s = OS="Mus musculus" &        // a filter applied to subject sequence database - OS field matches "Mus Musculus". For a list of fields, please login to GQ, and issue this URL: server/query?do=gqfetch.get_db_field_list
strat_name = blast|genepast|fragment|motif|hs3 &
strat_{$strat_name}_best_hit_keep_max' = 500 &
title=my blast of gb_pri &
... (other strategy specific params, see below)

if strat_name is

blast:

strat_blast_best_hit_definition = EVAL|SCORE & 
strat_blast_word_size_nuc = 11 &  // or strat_blast_word_size_pro = 5 in case there is protein qdb or sdb involved.
strat_blast_eval_cutoff = 10 &

genepast:

strat_genepast_perc_id = 80 &
strat_genepast_perc_id_over = QUERY|SUBJECT|SHORTER &

fragment:

strat_fragment_best_hit_definition = QUERY|SUBJECT|ALIGNMENT|SCORE &  // best hit definition: highest % ID over QUERY|SUBJECT|ALIGNMENT or highest score
strat_fragment_window_length_nuc = 50 & // find result within 50 bases window.
strat_fragment_perc_id_nuc = 96 &       // 96 % identity over the window size.

In case fragment search against protein subject:

strat_fragment_best_hit_definition = QUERY|SUBJECT|ALIGNMENT|SCORE &  // best hit definition: highest % ID over QUERY|SUBJECT|ALIGNMENT or highest score
strat_fragment_window_length_pro = 50 & // find result within 50 bases window.
strat_fragment_perc_id_pro = 96 &       // 96 % identity over the window size.
strat_fragment_genetic_code = 1 &       // use Standard genetic Code
strat_fragment_frames_pro_p1 = on &     // include translation of frame +1
strat_fragment_frames_pro_p2 = on &     // include translation of frame +2
strat_fragment_frames_pro_p3 = on &     // include translation of frame +3
strat_fragment_frames_pro_m1 = on &     // include translation of frame -1
strat_fragment_frames_pro_m2 = on &     // include translation of frame -2
strat_fragment_frames_pro_m3 = on       // include translation of frame -3

The format=json will return the result in a parsable JSON string. It will return a workflow_id that you can use in future queries against the result of this search.

Check Workflow run status

Assuming you launched a workflow and get the $workflow_id in return.

[Request]
http://server/query?do=gqworkflow.get_status&workflow=id:{$workflow_id}&format=json

You will get a data structure similar to this after json decode:

[status] => FINISHED
[progress] => 100

Examining Workflow Result (General)

Assuming you launched a workflow and get the $workflow_id in return.

Each workflow could produce different outputs, assume the one of the outputs is named $output_name, See below for how to list all outputs of each workflow type.

[Request]
http://server/query?do=gqworkflow.show_result&workflow=id:{$workflow_id}&workflow_output_name={$output_name}

You can continue to append other output specific params to the URL to change the return value.

For example, if you know that output is an alignment database, you can append template parameter to ask system to output the result in specified format:

- template=printf:%RE\t%RS\t%HD#ID\t%HD#PN\t%HD#PA\n

The printf template specifies what data in what format do you want to retrieve from a result. It says "give me all subject hits, each line should contain a list of fields separated by a tab: eval (%RE) and score (%RS) subject identifier (%HD#ID) and subject patent number (%HD#PN) and subject patent assignee (%HD#PA)."

For a list of 2-letter field that you can retrieve (e.g. ID, PA, PN above), please refer to this link

- template=printf:%RESCNT%PRE.

returns the total number of results. %PRE asks the system to skip the looping over all results.

The response of this request (gqworkflow.show_result) is often a http redirect header, which will lead to another page that actually handles the display the results of different type, so please make sure your client handles the redirect properly, or try the link on your browser and figure out the final URL, and use that.

When you launch the URL, please make sure you use the url-encoded version of the string, since '%' and '#' could be interpretted by browser with special meaning.

Examining Workflow Result (Sequence Search Workflow)

As explained above, the actual URL that delivers the Sequence Search alignment result is this:

query?do=gqresult&db=wf:1234.resdb&template=printf:%RE\t%RS\t%HD#ID\t%HD#PN\t%HD#PA\n

replace the 1234 with the actual workflow id of your run.

See all the outputs enlisted in a workflow type

Each workflow type could potentially produce different outputs, including multiple alignment databases, or sequence databases, or spreadsheet, etc.

Use this URL to find out all the outputs produced by a Seq Search workflow

[Request]
http://server/query?do=gqworkflow.get_outputs&workflow_type=GqWfSeqSearch

And for IP workflow

[Request]
http://server/query?do=gqworkflow.get_outputs&workflow_type=GqWfIpSearch

Check the token status

You received a token and you want to make sure it is still active.

[Request]
https://server/query?do=gquser.check_token&apitoken=4:oenfcwuWe09e 
[Response] 
similar to the response of get_token.

Delete a token

Drop your token - log out.

[Request] 
https://server/query?do=gquser.delete_token&apitoken=4:oenfcwuWe09e 
[Response] 
similar to the response of get_token.
Personal tools