Welcome to the GenomeQuest Documentation Wiki
URL API
Introduction
This layer provides calls formed with URLs that can be used three ways:
- Directly from the navigation toolbar of a web browser. GQ URLs allow changing GQ GUI default’s display and output stream.
- From external Web applications. GQ URLs can be inserted with simple, single-link call. This allows quickly connecting in-house application from / to the GQ platform.
- As a more advanced use, URLs can be used in a programmatic way, using regular programming languages.
The GQ URL API obeys the standard type of operations. It uses HTTP-encoded requests to GenomeQuest Web server. It is very similar in its principle and use, to other approaches such as the NCBI one . However, while it can perform the same types of operations, the GQ URL API provides extended capabilities compared to legacy tools.
The general form of an URL is:
http://www.gqserver.com/query?
do=<operation> &
param1=value1 &
param2=value2
do= is the main operation dispatcher. Four major operations are provided: gqfetch, gqworkflow, gqresult, gqplugin. A significant number of other operations are available, however these are the more common operations.
Help Online
The URL API's documentation is built into the GenomeQuest system. Try:
Each of the major dispatchers listed there (e.g., gqfetch, gqresult, etc.) have their own help systems:
Use Cases
Get a Token
Log into the system programatically.
[Request] http://server/query?do=gquser.get_token&username=[username]&password=[password]&apitokenttl=[3600]
[Response] token = 4:oenfcwuWe09e token_creation_date = server Unix timestamp for token’s creation time token_expiration_date = server Unix timestamp for token’s expiration time expired = 0 | 1 (0 for not expired, 1 for expired)
Fetch a Sequence Database
Assumes you already have an API token or are logged in via a browser.
[Request] http://server/query?do=gqfetch&db=GB_VRL&template=printf:%H%23ID\n&apitoken=xxx
Please note '%23' in above URL is '#' character in encoded form. template=printf:%H#ID\n specifies the data to be returned, in this case '%H#ID\n' means return all ids of sequences in Genbank Viral division, separated by carriage return. By default it only returns first 50, you can add &start=x&stop=y to control how many records to return.
Run an IP Sequence Search
Request
http://server/query?do=gqworkflow & workflow_type = GqWfIpSearch & qdb_seq_type = nucleotide|protein & // query sequence type qdb_seq = ATGCATGC & // actual query sequence sdb_def_id = GQPAT_NUC,GQPAT_PRT & // the subject sequence databases to search against, comma separated list. Or PROTDBS=GQPAT_PRT&NUCDBS=GQPAT_NUC strat_name = blast|kerr|fragment & // search strategy best_hit_keep_max = 500 & // number of results to keep title=my ip search & // run name email=joe@example.com & // the email address for when run is finished seqlenrange_low=6 & // the length of shortest sequence to search inside subject db seqlenrange_high=100000 // the length of longest sequence to search inside subject db ... (other strategy specific params, see below)
Some parameter details
strat_name
blast:
strat_blast_best_hit_definition = EVAL|SCORE & // whether the best hit is defined as lowest EVAL or highest score strat_blast_word_size_nuc = 11 & // word size. Or strat_blast_word_size_pro = 5 in case there is protein qdb or sdb involved (although the strat_name is still blast, no need to say blastx) strat_blast_scoring_matrix_nuc=NUC.3.1 & // matrix. Or strat_blast_scoring_matrix_pro=BLOSUM62 in case of protein strat_blast_eval_cutoff = 10 & // e value cutoff
kerr:
strat_genepast_perc_id = 80 & // percent identity cutoff strat_genepast_perc_id_over = QUERY|SUBJECT|SHORTER & // percent identity over query or subject or shorter of both.
qdb_seq
If single sequence, just supply the sequence bases. For example TTTTAAAA
If multiple sequence, you can use fasta format. For example
>seq_1 GCTAGCTAGCTA >seq_2 CGATCGATGCTAGT
If your request is done via GET, please make sure to URL encode the fasta sequence to preserve the line breaks.
Run a Sequence Search
[Request]
http://server/query?do=gqworkflow &
workflow_type = GqWfSeqSearch &
qdb_seq_type = nucleotide|protein &
qdb_seq = ATGCATGC &
sdb_def_id = GB_PRI &
s = OS="Mus musculus" & // a filter applied to subject sequence database - OS field matches "Mus Musculus". For a list of fields, please login to GQ, and issue this URL: server/query?do=gqfetch.get_db_field_list
strat_name = blast|genepast|fragment|motif|hs3 &
strat_{$strat_name}_best_hit_keep_max' = 500 &
title=my blast of gb_pri &
... (other strategy specific params, see below)
if strat_name is
blast:
strat_blast_best_hit_definition = EVAL|SCORE & strat_blast_word_size_nuc = 11 & // or strat_blast_word_size_pro = 5 in case there is protein qdb or sdb involved. strat_blast_eval_cutoff = 10 &
genepast:
strat_genepast_perc_id = 80 & strat_genepast_perc_id_over = QUERY|SUBJECT|SHORTER &
fragment:
strat_fragment_best_hit_definition = QUERY|SUBJECT|ALIGNMENT|SCORE & // best hit definition: highest % ID over QUERY|SUBJECT|ALIGNMENT or highest score strat_fragment_window_length_nuc = 50 & // find result within 50 bases window. strat_fragment_perc_id_nuc = 96 & // 96 % identity over the window size.
In case fragment search against protein subject:
strat_fragment_best_hit_definition = QUERY|SUBJECT|ALIGNMENT|SCORE & // best hit definition: highest % ID over QUERY|SUBJECT|ALIGNMENT or highest score strat_fragment_window_length_pro = 50 & // find result within 50 bases window. strat_fragment_perc_id_pro = 96 & // 96 % identity over the window size. strat_fragment_genetic_code = 1 & // use Standard genetic Code strat_fragment_frames_pro_p1 = on & // include translation of frame +1 strat_fragment_frames_pro_p2 = on & // include translation of frame +2 strat_fragment_frames_pro_p3 = on & // include translation of frame +3 strat_fragment_frames_pro_m1 = on & // include translation of frame -1 strat_fragment_frames_pro_m2 = on & // include translation of frame -2 strat_fragment_frames_pro_m3 = on // include translation of frame -3
The format=json will return the result in a parsable JSON string. It will return a workflow_id that you can use in future queries against the result of this search.
Check Workflow run status
Assuming you launched a workflow and get the $workflow_id in return.
[Request]
http://server/query?do=gqworkflow.get_status&workflow=id:{$workflow_id}&format=json
You will get a data structure similar to this after json decode:
[status] => FINISHED [progress] => 100
Examining Workflow Result (General)
Assuming you launched a workflow and get the $workflow_id in return.
Each workflow could produce different outputs, assume the one of the outputs is named $output_name, See below for how to list all outputs of each workflow type.
[Request]
http://server/query?do=gqworkflow.show_result&workflow=id:{$workflow_id}&workflow_output_name={$output_name}
You can continue to append other output specific params to the URL to change the return value.
For example, if you know that output is an alignment database, you can append template parameter to ask system to output the result in specified format:
- template=printf:%RE\t%RS\t%HD#ID\t%HD#PN\t%HD#PA\n
The printf template specifies what data in what format do you want to retrieve from a result. It says "give me all subject hits, each line should contain a list of fields separated by a tab: eval (%RE) and score (%RS) subject identifier (%HD#ID) and subject patent number (%HD#PN) and subject patent assignee (%HD#PA)."
For a list of 2-letter field that you can retrieve (e.g. ID, PA, PN above), please refer to this link
- template=printf:%RESCNT%PRE.
returns the total number of results. %PRE asks the system to skip the looping over all results.
The response of this request (gqworkflow.show_result) is often a http redirect header, which will lead to another page that actually handles the display the results of different type, so please make sure your client handles the redirect properly, or try the link on your browser and figure out the final URL, and use that.
When you launch the URL, please make sure you use the url-encoded version of the string, since '%' and '#' could be interpretted by browser with special meaning.
Examining Workflow Result (Sequence Search Workflow)
As explained above, the actual URL that delivers the Sequence Search alignment result is this:
query?do=gqresult&db=wf:1234.resdb&template=printf:%RE\t%RS\t%HD#ID\t%HD#PN\t%HD#PA\n
replace the 1234 with the actual workflow id of your run.
See all the outputs enlisted in a workflow type
Each workflow type could potentially produce different outputs, including multiple alignment databases, or sequence databases, or spreadsheet, etc.
Use this URL to find out all the outputs produced by a Seq Search workflow
[Request] http://server/query?do=gqworkflow.get_outputs&workflow_type=GqWfSeqSearch
And for IP workflow
[Request] http://server/query?do=gqworkflow.get_outputs&workflow_type=GqWfIpSearch
Check the token status
You received a token and you want to make sure it is still active.
[Request] https://server/query?do=gquser.check_token&apitoken=4:oenfcwuWe09e
[Response] similar to the response of get_token.
Delete a token
Drop your token - log out.
[Request] https://server/query?do=gquser.delete_token&apitoken=4:oenfcwuWe09e
[Response] similar to the response of get_token.