Tutorial

In this tutorial, we will build oligo porbe set for Arabidopsis genome.

Using Docker Terminal Version

Install

Install Docker

Download Chorus:

$ docker pull forrestzhang/docker-chorus

Parameter of Chorus:

-g GENOME, --genome GENOME
                        fasta format genome file
-i INPUT, --input INPUT
                        fasta format input file
-s SAVED, --save SAVED
                        result saved folder
-p PRIMER, --primer PRIMER
                        5' labeled R primer
-t THREADS, --threads THREADS
                        threads number or how may cpu you wanna use
-l LENGTH, --length LENGTH
                        probe length
--homology HOMOLOGY   homology, from 50 to 100
-d DTM, --dtm DTM     dTm, from 0 to 37

Download Reference Genome file:

$ wget https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas

$ docker run -v $PWD:/home/chorus -e CHORUS_USER=$USER -e CHORUS_UID=$UID \
  forrestzhang/docker-chorus -i TAIR10_chr_all.fas -g TAIR10_chr_all.fas -t 12

Please wait unit all precess done. There are some logs:

forrest /home/chorus
use local user:  forrest
Adding group 'forrest' (GID 1000) ...
Done.
Adding user 'forrest' ...
Adding new user 'forrest' (1000) with group 'forrest' ...
Creating home directory '/home/forrest' ...
Copying files from '/etc/skel' ...
/home/chorus exists
2.2.3
########################################
bwa version: /opt/software/bwa/bwa 0.7.12-r1044
jellyfish version: /opt/software/jellyfish/bin/jellyfish 2.2.3
genome file: TAIR10_chr_all.fas
input file: TAIR10_chr_all.fas
5' labeled R primer:
result output folder: /home/chorus/probes
threads number: 12
homology: 75
dtm: 10
########################################
...
...
14300000 / 14326857
14310000 / 14326857
14320000 / 14326857
Job finshed!!

When process done:

$ ls -lt probes/
total 1741428
-rw-r--r-- 1 root root  280927981 Aug 24 17:44 TAIR10_chr_all.fas_all.bed
-rw-r--r-- 1 root root   62050561 Aug 24 17:44 TAIR10_chr_all.fas.bed
-rw-r--r-- 1 root root         94 Aug 24 17:30 TAIR10_chr_all.fas.len
-rw-r--r-- 1 root root 1031512169 Aug 24 17:22 TAIR10_chr_all.fas_tmp_probe.fa
-rw-r--r-- 1 root root   59833928 Aug 24 17:19 TAIR10_chr_all.fas.sa
-rw-r--r-- 1 root root       7535 Aug 24 17:18 TAIR10_chr_all.fas.amb
-rw-r--r-- 1 root root        682 Aug 24 17:18 TAIR10_chr_all.fas.ann
-rw-r--r-- 1 root root   29916939 Aug 24 17:18 TAIR10_chr_all.fas.pac
-rw-r--r-- 1 root root  119667836 Aug 24 17:18 TAIR10_chr_all.fas.bwt
-rw-r--r-- 1 root root  121183059 Aug 24 17:17 TAIR10_chr_all.fas
-rw-r--r-- 1 root root   78102510 Aug 24 17:17 TAIR10_chr_all.fas_17mer.jf

TAIR10_chr_all.fas.bed is the probe file.

$ more probes/TAIR10_chr_all.fas.bed
1           52      96      TCCCTAAATCTTTAAATCCTACATCCATGAATCCCTAAATACCTA
1           211     255     TTTGAGGTCAATACAAATCCTATTTCTTGTGGTTTTCTTTCCTTC
1           346     390     CCTTAGGGTTGGTTTATCTCAAGAATCTTATTAATTGTTTGGACT
1           426     470     TTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAAT
1           496     540     TCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
1           551     595     TAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAG

There are four columns in a row, first column is chromosome name, second is oligo start site, third is oligo end site, last one is oligo probe sequence. You can use excel or text editor to open this file.

Using Manually Install Version

Run In Terminal

Make a project folder

$ cd ~
$ mkdir sampleproject
$ cd sampleproject

Download Arabidopsis reference genome

$ wget https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas

Test chorus software

$ python3 /opt/software/Chorus/Chorus.py -h

    usage: Chorus [-h] [--version] [-j JELLYFISH] [-b BWA] -g GENOME -i INPUT
            [-s SAVED] [-p PRIMER] [-t THREADS] [-l LENGTH]
            [--homology HOMOLOGY] [-d DTM] [--step STEP] [--docker DOCKER]

    Chorus Software for Oligo FISH probe design

    optional arguments:
    -h, --help            show this help message and exit
    --version             show program's version number and exit
    -j JELLYFISH, --jellyfish JELLYFISH
                            jellyfish path
    -b BWA, --bwa BWA     bwa path
    -g GENOME, --genome GENOME
                            fasta format genome file
    -i INPUT, --input INPUT
                            fasta format input file
    -s SAVED, --save SAVED
                            result saved folder
    -p PRIMER, --primer PRIMER
                            5' labeled R primer
    -t THREADS, --threads THREADS
                            threads number or how may cpu you wanna use
    -l LENGTH, --length LENGTH
                            probe length
    --homology HOMOLOGY   homology, from 50 to 100
    -d DTM, --dtm DTM     dTm, from 0 to 37
    --step STEP           step length, min=1
    --docker DOCKER

Run chorus software

$ python3 /opt/software/Chorus/Chorus.py -i TAIR10_chr_all.fas \
  -g TAIR10_chr_all.fas -t 12 \
  -j /opt/software/jellyfish/bin/jellyfish -b /opt/software/bwa/bwa -s sample

When job finish, the oligo probes will output to ‘sample’ folder

$ cd sample
$ ls -lt *

    total 1741428
    -rw-r--r-- 1 root root  280927981 Aug 24 17:44 TAIR10_chr_all.fas_all.bed
    -rw-r--r-- 1 root root   62050561 Aug 24 17:44 TAIR10_chr_all.fas.bed
    -rw-r--r-- 1 root root         94 Aug 24 17:30 TAIR10_chr_all.fas.len
    -rw-r--r-- 1 root root 1031512169 Aug 24 17:22 TAIR10_chr_all.fas_tmp_probe.fa
    -rw-r--r-- 1 root root   59833928 Aug 24 17:19 TAIR10_chr_all.fas.sa
    -rw-r--r-- 1 root root       7535 Aug 24 17:18 TAIR10_chr_all.fas.amb
    -rw-r--r-- 1 root root        682 Aug 24 17:18 TAIR10_chr_all.fas.ann
    -rw-r--r-- 1 root root   29916939 Aug 24 17:18 TAIR10_chr_all.fas.pac
    -rw-r--r-- 1 root root  119667836 Aug 24 17:18 TAIR10_chr_all.fas.bwt
    -rw-r--r-- 1 root root  121183059 Aug 24 17:17 TAIR10_chr_all.fas
    -rw-r--r-- 1 root root   78102510 Aug 24 17:17 TAIR10_chr_all.fas_17mer.jf

TAIR10_chr_all.fas.bed is the probe file.

$ more probes/TAIR10_chr_all.fas.bed
1           52      96      TCCCTAAATCTTTAAATCCTACATCCATGAATCCCTAAATACCTA
1           211     255     TTTGAGGTCAATACAAATCCTATTTCTTGTGGTTTTCTTTCCTTC
1           346     390     CCTTAGGGTTGGTTTATCTCAAGAATCTTATTAATTGTTTGGACT
1           426     470     TTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAAT
1           496     540     TCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
1           551     595     TAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAG

There are four columns in a row, first column is chromosome name, second is oligo start site, third is oligo end site, last one is oligo probe sequence. You can use excel or text editor to open this file.