Welcome to the GenomeQuest Documentation Wiki

Sequence Database Upload

From GQ Wiki
Jump to: navigation, search

Uploading sequence files allows you to run NGS analyses with them. By default all uploaded files are private. Only you can see them. You can share uploaded sequence databases with members of your group if you so choose, after the upload process.

Contents

Why upload sequences?

There are three modes of file upload in GenomeQuest.

Next Generation Sequencing (NGS) reads

If you run Next Generation Sequencing (NGS) experiments you can upload the reads from your NGS experiment prior to analyzing it.

Variants

If you called variants (SNPs) from your NGS experiment, these variants can be uploaded.

Other Sequence Databases

You can upload other sequence databases like

  • Reference databases (genome, transcriptome etc) needed for your NGS analysis if they are not already present in GenomeQuest. These might include:
    • Organism specific databases like AceDB and TAIR or annotation specific databases like GPCR database.
    • Your proprietary databases.
  • Upload the sequences of the genes of interest to your organization in order that you can search it.
  • 3rd party databases that you have a license to access.

For uploading NGS Reads or Variants

Go to the upload page

By clicking on the Upload Data button on the landing page of your My GenomeQuest page. If your My GenomeQuest page is showing a list of results, you will not see the upload button. In that case, just click the GenomeQuest logo at the top left to go to the landing page.

Click the Upload button on the My GenomeQuest page

IP Interface

If you are an IP user, you may have different buttons on your landing page. In that case, you can change your interface to show the upload button or you can reach the upload page this way:

  1. Click the Uploaded NGS reads -> Uploaded reads link in the left panel.
  2. Next click the Upload reads link at the top of the main panel.

Upload reads button

However, you are most likely need to upload not NGS reads but other kinds of reference data sets. If that is the case, see the section below on uploading other sequence databases.

Choose NGS Reads or variants

On the upload page, first choose if you are uploading a database of NGS reads or variants.

NGS reads or Variants

Upon choosing the type, you will see a pop-up asking permission to access your files for transfer. Click the Allow button

Pop-up asking permission to access your files to transfer

This creates a new dialog area on the web page. Fill this page by

  1. putting in a name for your upload (spaces not allowed in the name) and
  2. choosing the file you wish to upload by clicking the Browse button which opens a file picker.

Give the database a name and choose file to upload


Click Submit

Upon clicking submit button, the file you chose will be transferred to the GenomeQuest servers.

Note that this is a simple file transfer. You will have to process the reads (or variants) in order to make them usable in the GenomeQuest system.

Uploading Other Sequence Databases

Sometimes you want to upload reference databases or other sequence databases of interest to your organization. In any case, these are not NGS reads nor are they variants. In this case, GenomeQuest provides a different way to effect the upload.

  1. Go to Import Annotated Sequences the upload page
    1. click the Quick Launch menu in the upper right side of the page.
    2. This opens the upload page.
  2. Sequence type. On the upload page, first choose if you are uploading a database of nucleotide sequences or protein sequences.
  3. Choose file format. Depending on your file, choose a format. GenomeQuest accepts sequence databases in fasta, EMBL and in fastq formats.
  4. Give it a Name. Give the database a name. This is the name by which the database will be displayed in GenomeQuest page - on the My GenomeQuest page as well as on the workflow pages.
  5. Use the file browser to select your sequence file. Click the Browse button to open a file picker. Choose the file that contains your sequence database.
  6. Click Submit.

Steps in the upload page

Modes of upload

Depending on the amount of data you have available to transfer, this web-based transfer can get pretty tedious. GenomeQuest allows four modes of electronic data transfer. Choose the one that best fits your needs.

  1. Web browser-based data transfer using the procedure described above is very convenient when there is a small amount of data to upload and that data is on your desktop computer. Note: Do not close the web browser while the transfer is in progress. If you close the browser, the file transfer will be aborted.
  2. Aspera transfer Login with your GenomeQuest username and password after clicking this link: GenomeQuest Aspera Server Once logged in, the page will prompt you to download and install the Aspera client which is a data transfer program used by a number of entities including the NCBI for transferring very large amounts of data. Note: Do not close the Aspera client while the transfer is in progress. If you do so, the file transfer will be aborted.
  3. FTP is also available for those who wish to send large amounts of data straight from their sequencing server to GenomeQuest. For using this,
    1. Use your command ftp program and connect to myfiles.genomequest.com. You can also use Filezilla, or any other FTP client.
    2. Use your GenomeQuest username and password to authenticate.
    3. Start transferring files.
  4. Send us a disk. For really large data sets the most expeditious way might be to send us a disk by courier. We will then make it available to you on GenoemQuest. If you would like to do this, please drop us a line before doing sending.

Transfer times

Actual time it takes to ransfer a file from your desktop to the GenomeQuest servers depends on many factors. The most important factors are:

  • Your bandwidth. The speed of your connection to the internet. For example a single T1 line can transfer about 1.5 Mega bits per sec which is approximately 200 kilo bytes per sec (since 1 byte = 8 bits).
  • Throttling. Your IT department (and even some ISPs) will often restrict how much of the bandwidth you can use for file transfers. (If file transfers use up the whole bandwidth it will adversely impact web browsing experience). Typically between 20 and 30% of the bandwidth is allocated for file transfers.

Assuming that 25% of the line speed is allocated to file transfers, the following table shows estimated transfer times for different file sizes:

File Size Transfer time
Using a T1 line
Transfer Time
using a T3 line
10Mb 0.06 hrs
(4 mins)
0.002 hrs
(about 6 secs)
50Mb 0.3 hrs
(18 mins)
0.01 hrs
(about 36 secs)
100Mb 0.58 hrs
(36 mins)
0.02 hrs
(about 1 min 12 secs)
500 Mb 2.88 hrs 0.10 hrs (about 6 mins)
1 Gb 5.76 hrs 0.20 hrs (about 12 mins)
5 Gb 28 hrs 1 hr
10 Gb 58 hrs (about 2.5 days) 1.99 hrs
50 Gb 288 hrs (about 12 days) 10 hrs
100 Gb 576 hrs (about 24 days) 19.87 hrs
500 Gb 2879 hrs (about 120 days) 100 hrs (about 4 days)
750 Gb 4318 hrs (about 180 days) 149 hrs (about 6 days)
1000 Gb 5757 hrs (about 240 days) 198 hrs (about 8 days)

Tips For Speeding Up File Transfer

There are some very basic things that you can do to optimize the speed of your uploads (including FTP uploads) to GQLive. Below are the things to try first:

  1. Make sure that you compress your files; this will reduce transfer time by approximately 60%.
  2. Ensure that you do not have any software or settings that will rob your uploads of bandwidth. If you are running Windows 7 or Vista disable the QoS Packet Scheduler (in the Control Panel click Network and Internet, then Network and Sharing Center, then Manage Network Connections, select your network and click Properties; remove the check mark for QoS Packet Scheduler). Make sure that you do not have peer-to-peer software (e.g. Kazaa, LimeWire) running.
  3. Restart your browser, only open one window and keep it in the foreground. If you want the absolute fastest upload performance close down all other applications while running your uploads. While that may not be practical all the time, try to optimize the time you are doing your uploads when you have few other heavy processes running.
  4. Run the Windows defragmentation program (in the Control Panel, click System and Maintenance, then under the Administrative Tools section, click Defragment your hard drive) to ensure that your disk is clean and your I/O is fast.
  5. For FTP uploads, sometimes using a specialized upload product such as FileZilla, (free download at http://filezilla-project.org/download.php ) can improve performance. If you are uploading more than 10 files, increase the default setting accordingly.
  6. Test your network upload speed to Boston to make sure it is working well to other sites http://www.speedtest.net and http://www.bandwidthplace.com/ are good test sites. Is this consistent with your upload experience to GQLive? Take a note of the results to use as a comparison for further troubleshooting.
  7. Run a test after completing items 1 through 6 above noting your file size and resulting upload time.

If you still perceive the upload to be too slow the next step is to e-mail GenomeQuest Support at support@genomequest.com and / or consider sending us a disk or thumb drive via courier.

Security of Transferred Files

Security during data transfer

Data transfer using our browser based Applet to GenomeQuest uses secure protocols (https or ftps). You can use FTPS in Filezilla on port 990.

Security of access

Once on the GenomeQuest server, data is protected through our access protection layers.

  • No one, to whom you have not explicitly granted access, can see your data.
  • You cannot (even accidentally) share data with anyone outside your group.

Your information is secure on GenomeQuest. Read our privacy policy for further details.

Transfer to your in-house server

For the ultimate in security, you can bring GenomeQuest in-house i.e. host it on one of your own servers. If you already have it in-house contact your GenomeQuest Administrator for information on back-up and privacy policies. If you don't and are interested in bringing it in-house, see our minimum server configuration to learn more about what you need in terms of hardware.

Personal tools