Welcome to the GenomeQuest Documentation Wiki
LSPDB HELP
From GQ Wiki
| Option/sub-option | Description | Examples |
|---|---|---|
| -version | Prints the version number | > lspdb -version lspdb: BIOFACET v6.0.1.D010 [2009.07.22] (c) GenomeQuest [Build: LINUX-x86_64-64bit-gcc-libc-2.5-OPT 16:01:26] |
| -h / -help | Prints a long list of all available options | > lspdb -help|more |
| -verbose | Obsolete | |
| -nolog | Suppresses command log information. (See in ~/.lassaprc/) | > lspdb DB -nolog |
| -stdoutfile <FILE> | Redirects standard out to file <FILE> | > lspdb DB -stdoutfile STDOUT |
| -stderrfile <FILE> | Redirects standard error to file <FILE> | > lspdb DB -stderrfile STDERR |
| -gui | Obsolete | |
| -licwait | Obsolete | |
| -licqueue | Obsolete | |
| -sort <sort expression> | Old sorting implementation. Use bfqlsort instead | > lspdb DB -sort 'H#OS,-L' |
| -group <group expression> | Old grouping implementation. Use bfqlgroup instead. | > lspdb DB -group '{H#OS},{-L},{H#OS}' |
| -bfqlsort <bfqlsort expression> | Used to sort. Multiple criteria sorting is allowed, as well as complex sorting expressions. Use - or + to specify sorting order; +, the default, is ascending order. | > lspdb DB -bfqlsort '[OS],-[L]' |
| -bfqlgroup <bfqlgroup expression> | Used for grouping sequences. Refer to the complete manual for details. | > lspdb DB -bfqlgroup '{[OS]},{[-L]},{[OS]}' |
| -fbfqlgroup <FILE> | Reads <FILE> and uses it as a bfql group expression. | > lspdb DB -fbfqlgroup filename |
| -pergroupid <NUMBER> | The maximum number of sequences to keep per group. | > lspdb DB -bfqlgroup '{[OS]},{[-L]},{[OS]}' -pergroupid 1 |
| -maxgroup <NUMBER> | The maximum number of groups to keep. | > lspdb DB -bfqlgroup '{[OS]},{[-L]},{[OS]}' -pergroupid 1 -maxgroup 5 |
| -maxkeep <NUMBER> | The maximum number of sequences to keep. | > lspdb DB -bfql 'OS="homo sapiens"' -maxkeep 5 |
| -select <SELECT STATEMENT> | Old selection implementation. Use bfql instead. | |
| -range <RANGE EXPRESSION> | Applied before selection to restrict the sequences use. | > lspdb DB -range '1:30,42' |
| -frange <FILE> | Uses the ranges in <FILE> as above. First line of <FILE> must contain "Biofacet Range TXT v2.5". Second line contains the ranges. | > cat myrange Biofacet Range TXT v2.5 1,12,100:200 > lspdb DB -frange myrange |
| -memctrl <MEMCTRL EXPRESSION> blk_sized=<val> blk_seqnd=<val> blk_ctbd=<val> |
Internal optimising. Advanced usage only. Using the following key=value expression in a comma separated list: memory block size limit in MB. Default 64M Number of sequences Number of sequences in CTB file |
> lspdb -memctrl 'blk_sized=128' |
| -dump <FILENAME> | Creates a biofacet sequence database. | > lspdb DB -bfql 'OS="homo sapiens"' -dump MyHumanDB |
| -vdump <FILENAME> | As -dump above, except that it keeps the same virtual db logic as the original. | > lspdb DB -bfql 'OS="homo sapiens"' -vdump MyHumanDB |
| -shadow <FILENAME> | As -dump, except that only sequences are dumped. This allows the creation of files for cluster nodes. | > lspdb DB -bfql 'OS="homo sapiens"' -shadow MyHumanDB_withannot |
| -split <NUMBER> | Splits the original db in <NUMBER> biofacet dbs. The file names will be the name of the original file _splt_numbers from 0 to <NUMBER>-1. A virtual biofacet database called original name_splt.ind is also created. | > lspdb DB -count db nbseqs = 1000 > lspdb DB -split 10 > ls DB_splt* |
| -chunk <NUMBER1> -overlap <NUMBER2> |
Splits genome size sequences in chunks of <NUMBER1> bases with <NUMBER2> overlapping bases. Should be used with | lspbank | > lspdb DB -chunk 100000 -overlap 50 | lspbank -nuc -F MY_NEW_DB |
| -format <FORMATNAME> | <FORMATNAME> can be FASTA or DB2. Will output records in FASTA or DB2 format. | > lspdb DB -format FASTA |
| -frame <FRAME EXPRESSION> | Can only be used with -format (above). Will translate the sequence where <FRAME EXPRESSION> can be one or more (comma separated) of "for rev all top bot +1 +2 +3 -1 -2 -3", meaning forward, reverse, all 6 frames, top 3 frames, bottom 3 frames and frame +1,... -3. | > lspdb DB -format FASTA -frame 'for,rev' > lspdb DB -format FASTA -frame all |lspbank -prot -F DB_TRANSLATED -T fasta |
| -crc | Obsolete. Use %CRC with printf (see below) | |
| -noseq | Does not output the sequence part | > lspdb DB -noseq |
| -motif | Obsolete. | |
| -lspid | Obsolete | |
| -printf <FORMAT STRING> | This is the most practical way to format biofacet output. A separate manual demonstrates its possibilities. Below is listed the list of accessors and formators, where the background of the cells is grey. | > lspdb DB -printf '%H#OS\n%VOID' |
| Numerical formatters | It is possible to apply C-like numerical formatters in a printf statement. They are written just after the printf variable as .[%10d]. For instance %N displays the sequence index. To display it using 10 characters, simply do -printf '%N.[%10d]\n%VOID' which will display: 1 2 ... 10 | |
| Extra variables (%1 %2 ...) | Some printf variables can use extra variables called %1, %2, ... They are described in some rows, below. To use these extra variables, you must enclose them in [] after the % sign and either before a variable name or/and after. For instance -printf '%[%1 ]S[%2 %3\n%VOID]' where %S is the variable. | |
| Modifiers | Noted m1,m2,..., modifiers are available for %S and %H. They allow truncation and "chunking" of the output and are used after the dot sign. For instance to display 10 residues at a time (then a newline) from residue 100 to residue 200, %S.10.100.200[\n%VOID] | |
| %VOID | Empty reference. Since printf never writes the last constant string, use %VOID to change this behaviour. For instance the first command on the right does not write the last new line, the second command does. | > lspdb DB -printf '%H#ID %H#OS\n' > lspdb DB -printf '%H#ID %H#OS\n%VOID' |
| %PRE | Preamble separator. Anything before %PRE is done once at the very beginning. (Same as BEGIN in awk) | > lspdb DB -bfql 'L>1000' -printf 'Number of sequences selected=%NBSEQS\nout of a total of %ONBSEQS\n%PRE' |
| %POST | Postamble separator. Anything after %POST is done once at the very end. (Same as END in awk) | > lspdb DB -bfql 'L>1000' -printf '%H#ID\t%L\n%VOID%POSTNumber of sequences selected=%NBSEQS\nout of a total of %ONBSEQS\n' |
| %DATE | Current date | > lspdb DB -printf '%DATE\n%PRE' |
| %VER | Biofacet version | > lspdb DB -printf '%VER\n%PRE' |
| %CRC | CRC string. This is a CRC string computed for the sequence. For two given sequences, if their length is identical and their CRC is different, the two sequences are different. If the CRC are the same, then there is a small chance that the two sequences are in fact different. This can be used to remove redundancy in databases. | > lspdb DB -range 1:10 -printf '%H#ID %CRC %L\n%VOID' |
| %NGC | Genetic code index. This is an internal index used to identify which genetic code is used. 1 is the standard genetic code. | |
| %GC | Genetic code name. Usually "Standard". | |
| %NDBTYPE | sequence db type index. Internal index. | |
| %DBTYPE | sequence db type. NUC for nucleotide, NUCCS for color space, PRO for protein. | > lspdb DB -range 1 -printf '%DBTYPE\n%VOID' |
| %NBSEL | Advanced master/slave usage. This shows the number of sequences before "reduce" is applied | |
| %NBSEQS | The number of sequences after all filtering. | > lspdb DB -bfql 'L>1000' -printf 'Number of sequences selected=%NBSEQS\nout of a total of %ONBSEQS\n%PRE' |
| %ONBSEQS | The number of sequences before any filtering. | |
| %NBMASTERS | Advanced master/slave usage. master count | |
| %ONBMASTERS | Advanced master/slave usage. original master count | |
| %NBRESIDS | Number of residues before any filtering is applied. | > lspdb DB -bfql 'L>1000' -printf 'Number of sequences selected=%NBSEQS out of a total of %ONBSEQS and %NBRESIDS residues\n%PRE' |
| %MAXLEN | The max sequence length in the database before any filtering. | > lspdb DB -bfql 'L>1000' -printf 'Number of sequences selected=%NBSEQS\nout of a total of %ONBSEQS and %NBRESIDS residues\nThe longest sequence in the database is %MAXLEN residue long\n%PRE' |
| %NBANNOTS | The number of annotations in the database. | > lspdb DB -range 1 -printf '%NBANNOTS\n%VOID' |
| %ANNOTS %1 .. %5 |
This displays the annotation names and some annotation attributes. %1 is the attribute (string) and is one of (normal, shared, master). %2 is the class (string) and is generally "native". %3 is the type (string) and is one of (int, string, control). %4 is the index type (string) and is one of (notindexed, btree, hash). %5 is an advanced parameter called compound annotation fields (list) and is often empty. |
> lspdb DB -range 1 -printf '%ANNOTS[ %3 %4\n]\n%VOID' ID string hash AC string hash SV string hash GI string notindexed GN string hash SY string notindexed D6 int btree |
| %GNAME %1 .. %12 |
Displays the sequence db generic name. Attributes %1 to %12 show respectively: generic name virtual (string); generic name physical (string); current file number (unsigned int); number of files (unsigned int); number of sequences in virtual database (unsigned int); number of sequences in current physical database (unsigned int); original number of sequences in virtual database (unsigned int); original number of sequences in current physical database (unsigned int); file name of current physical database (string); crypto ident (string); file offset (unsigned int); c_len (unsigned int) | |
| %DBFILE | Displays the sequence db file name. Useful also with %DBPATH. See below and example. | > lspdb DB -printf '%DBPATH/%DBFILE\n%VOID%PRE' /opt/MyDBs/DB |
| %DBPATH | sequence db file path. See example above. | |
| %STATS | sequence db stats. Not yet implemented. | |
| %GROUP (also ^GROUP and $GROUP) %1 .. %7 |
The group variable is special in that it can start with ^,% or $. ^GROUP will apply a the beginning of a group, $GROUP at the end and %GROUP for each element in the group. Using ^ and $GROUP are useful to produce XML for instance where a TAG is opened before a group and closed after a group. Everything below applies to all 3 GROUP variants. By default %GROUP prints the group index (unsigned int). To make it not print it, use %GROUP.[] Everything that has to be printed must be done between [] either after the % or after the GROUP or after the GROUP.[]. The extra variables %1 to %7 print the following information: %1 : The group index (unsigned int) %2: The group first element (unsigned int) %3 : The group last element (unsigned int) %4: Number of elements (unsigned int) %5: Number of groups (unsigned int) %6: Number of members before filtering (unsigned int) %7: Number of groups before filtering (unsigned int) |
> lspdb DB -bfqlgroup '{[OS]},{[-L]},{[OS]}' -pergroupid 3 -printf '^GROUP.[][--- %H#OS %4 of %6\n%VOID]%GROUP.[][ %H#ID %L\n%VOID]%POST--- ' --- Borrelia burgdorferi 3 of 56 NP_862626 429 NP_862652 406 YP_783878 371 --- Buchnera aphidicola 3 of 9 NP_047187 516 NP_047189 466 NP_047188 363 --- Leptospirillum ferrooxidans 3 of 10 YP_220399 133 YP_220386 127 YP_220385 100 --- |
| %NBGROUPS %1 %2 %3 %4 %5 %6 |
Prints the number of groups by default. Can be switched of with .[] (like for %GROUP). %1 shows the number of groups (unsigned int); % 2, the number of members (unsigned int); %3, the number of groups before filtering (unsigned int); %4, the number of members before filtering (unsigned int); %5, the maxgroup (unsigned int); and %6, pergroupid (unsigned int) |
> lspdb DB -bfqlgroup '{[OS]},{[-L]},{[OS]}' -pergroupid 3 \ -printf '%NBGROUPS.[][%1 groups, %2 members, %3 groups before filtering, %4 members before filtering, maxgroup=%5 pergroupid=%6\n%VOID]%PRE' 5 groups, 14 members, 5 groups before filtering, 100 members before filtering, maxgroup=0 pergroupid=3 |
| %N | Prints the sequence index. Warning, this is after filters are applied. Generally speaking, %ON (see below) should be used. See the same example ran with %N and %ON on the right hand side and below. | > lspdb DB -range 200:220 -bfql 'L>100' -sort '-L' -printf '%N %L\n%VOID' 1 730 2 603 3 502 4 473 5 422 6 358 7 352 8 345 9 342 10 331 11 266 12 252 13 206 14 143 15 142 |
| %ON | Prints the original sequence index, before any kind of filtering is applied. This is a number. | > lspdb DB -range 200:220 -bfql 'L>100' -sort '-L' -printf '%ON %L\n%VOID' 206 730 200 603 217 502 207 473 211 422 216 358 214 352 220 345 210 342 212 331 203 266 202 252 204 206 201 143 215 142 |
| %H %1 .. %6 m1 and m2 |
This displays the sequence annotations. Generally, %H is used with a # qualifier. Simply typing %H#ID will display the ID field. The %1 to %6, as well as the m1 and m2 are very rarely used. %1 is the annotation name (string) %2, the annotation index (int) %3, the annotation name if exists (string) %4, the annotation index if exists (int) %5, master/slave (string) %6 the master ord (int) m1 and m2 are the start and stop position, so that to print the first two letters of the ID, one would type %H#ID.1.2 |
> lspdb DB -printf '%H#ID %H#DE\n%VOID' |
| %L | Prints the sequence length. | > lspdb DB -printf '%L,%POST\n%VOID' 730,603,502,473,422,358 |
| %S %1 %2 %3 m1 m2 m3 |
%S is used to display the sequence. %1 is the first position in chunk (unsigned int), %2 is the last position in chunk (unsigned int) and %3 is the chunk length (unsigned int). Those are used to display coordinates. Integer formatting can be used as in C. m1 is the chunk length limit. Use %S.10 to display the sequence by chunks of 10 residues. 0 means in one chunk. m2 and m3 are the start and stop positions. Sequence coordinates start at 1. Use negative numbers to start/stop from the end. %S.0.1.-1 is equivalent to %S |
> lspdb DB -range 1 -printf '%S\n%VOID' > lspdb DB -range 1 -printf '%S.100[\n%VOID]' > lspdb DB -range 1 -printf '%[%1.[%5d] ]S.5.-10.-1[ %2.[%5d] (%3 base long)%VOID\n]\n%VOID' 448 TAAAA 452 (5 base long) 453 ATGGG 457 (5 base long) |
| %IDNAME | Value of first annotation. | > lspdb DB -range 1:10 -printf '%IDNAME\n%VOID' |
| -fprintf <FILE> | <FILE> contains a printf statement to use | > lspdb DB -fprintf myprintf.txt |
| -win_start <NUMBER> | Used for pagination. | |
| -win_stop <NUMBER> | See above | |
| -full_sel | forces complete selection count for win_start/win_stop. (Advanced used only) | |
| -xml | Outputs a XML | |
| -xml_start <NUMBER> | Same as win_start, for xml output | |
| -xml_stop <NUMBER> | Same as win_stop, for xml output | |
| -xml_full_sel | Same as full_sel, for xml output | |
| -xml_seqpos_start <NUMBER> | first residue to display in xml | |
| -xml_seqpos_stop <NUMBER> | last residue to display in xml | |
| -xml_all_but <STRING> | XML display options. Advanced used only. | |
| -count | Returns the number of sequences (usually used in combination with a bfql selection). Using -count is incompatible with other output switches such as -printf. | > lspdb DB -range 1:100 -count db nbseqs = 100 db nbres = 20570 db maxlen = 634 db fields = ID AC SV > lspdb DB -range 1:100 -bfql 'L < 500' -count db nbseqs = 98 db nbres = 19420 db maxlen = 487 db fields = ID AC SV |
| -ocoll <COLLECTION> | NEW. Not yet entirely defined. | |
| -esmdb <ESM DB> | Advanced use only. | |
| -dsptmp <TMPDATASPACE> | Advanced use only. | |
| expand|count> | Advanced use only. | |
| Indexing options are described in another manual. -indexinit -indexcreate <annotation list> -indexdelete <annotation list> -indexinfo <annotation list> -indexsinfo -indexlist <annotation list> -indexhisto <annotation list> -wordlist <annotation list> -indexstats <annotation list> -indexselect <selection expression> -indexcontrol <index control> |
||
| -bfql | Usage is described in another manual. The BioFacet Query Language is used for creating queries. Use single quotes to enclose your bfql. | > lspdb DB -bfql 'id="NC_*"' |
| -fbfql <BFQLFILE> | Reads the <BFQLFILE> as a bfql query. | |
| -bfqlcontrol <CONTROL LIST> | Advanced usage.
|