MAKER predictions

See the scripts used for MAKER here .

Input files for Maker

RNA-Seq raw reads fastq files: *_1.fastq.gz, *_2.fastq.gz
reference genome fasta file: TAIR10_chr_all.fas

Merge RNA-Seq raw reads

To simplify handling of files, combine all the forward reads to one file and all the reverse reads to another.

1cat *_1.fastq.gz >> forward_reads.fq.gz
2cat *_2.fastq.gz >> reverse_reads.fq.gz

Run trinity to predict transcripts and their inferred proteins

  1. Run trinity for de novo transcriptome assembly:

    1./01_runTrinity.sh forward_reads.fq.gz reverse_reads.fq.gz
    

    Note: You will get the transcripts fasta file in trinity_run folder.

2. Predict CDSs from transcriptome:
1./02_runTransDecoder.sh trinity.fasta

Note: You will get the protein sequence ( trinity.fasta.transdecoder.pep ) in working directory.

MAKER requires five (non-automated) steps

  1. Generate the CTL files:

    1module load GIF/maker
    2module rm perl/5.22.1
    3maker -CTL
    

    This will generate 3 CTL files (maker_opts.ctl, maker_bopts.ctl and maker_exe.ctl), you will need to edit them to make changes to the MAKER run. For the first round, change these lines in maker_opts.ctl file:

    1genome=TAIR10_chr_all.fas
    2est=trinity.fasta
    3protein=trinity.fasta.transdecoder.pep
    4est2genome=1
    5protein2genome=1
    6TMP=/dev/shm
    
  2. Execute MAKER 03_maker_start.sh in a slurm file. It is essential to request more than 1 node with multiple processors to run this efficiently.

    1# Define a base name for maker output folder as the first argument.
    2./03_maker_start.sh maker_case
    
  3. Upon completion, train SNAP and AUGUSTUS:

    1#Use the same base name as previous step for first argument.
    2./04_maker_process.sh maker_case
    
  4. Train GeneMark with genome sequence:

    1./05_runGeneMark.sh TAIR10_chr_all.fas
    
  5. Once complete, modify the following lines in maker_opts.ctl file:

    1snaphmm=maker.snap.hmm
    2gmhmm=gmhmm.mod
    3# Define a species as you want, but the name should not be existing in the augustus/config/species folder.
    4augustus_species=maker_20171103
    

    Then, 03_maker_start.sh again:

    1# Use the same base name as previous step for first argument.
    2./03_maker_start.sh maker_case
    
  6. Finalize predictions:

    1./06_maker_finalize.sh maker_case
    

    You will get the predicted gene models (maker_case.gff), protein sequences (maker_case.maker.proteins.fasta) and transcript sequence (maker_case.maker.transcripts.fasta) in the working directory.