MAKER predictions
See the scripts used for MAKER here .
Input files for Maker
RNA-Seq raw reads fastq files: *_1.fastq.gz, *_2.fastq.gz
reference genome fasta file: TAIR10_chr_all.fas
Merge RNA-Seq raw reads
To simplify handling of files, combine all the forward reads to one file and all the reverse reads to another.
1cat *_1.fastq.gz >> forward_reads.fq.gz
2cat *_2.fastq.gz >> reverse_reads.fq.gz
Run trinity to predict transcripts and their inferred proteins
Run trinity for de novo transcriptome assembly:
1./01_runTrinity.sh forward_reads.fq.gz reverse_reads.fq.gz
Note: You will get the transcripts fasta file in trinity_run folder.
1./02_runTransDecoder.sh trinity.fasta
Note: You will get the protein sequence (
trinity.fasta.transdecoder.pep
) in working directory.
MAKER requires five (non-automated) steps
Generate the CTL files:
1module load GIF/maker 2module rm perl/5.22.1 3maker -CTL
This will generate 3 CTL files (
maker_opts.ctl
,maker_bopts.ctl
andmaker_exe.ctl
), you will need to edit them to make changes to the MAKER run. For the first round, change these lines inmaker_opts.ctl
file:1genome=TAIR10_chr_all.fas 2est=trinity.fasta 3protein=trinity.fasta.transdecoder.pep 4est2genome=1 5protein2genome=1 6TMP=/dev/shm
Execute MAKER
03_maker_start.sh
in a slurm file. It is essential to request more than 1 node with multiple processors to run this efficiently.1# Define a base name for maker output folder as the first argument. 2./03_maker_start.sh maker_case
Upon completion, train SNAP and AUGUSTUS:
1#Use the same base name as previous step for first argument. 2./04_maker_process.sh maker_case
Train GeneMark with genome sequence:
1./05_runGeneMark.sh TAIR10_chr_all.fas
Once complete, modify the following lines in
maker_opts.ctl
file:1snaphmm=maker.snap.hmm 2gmhmm=gmhmm.mod 3# Define a species as you want, but the name should not be existing in the augustus/config/species folder. 4augustus_species=maker_20171103
Then,
03_maker_start.sh
again:1# Use the same base name as previous step for first argument. 2./03_maker_start.sh maker_case
Finalize predictions:
1./06_maker_finalize.sh maker_case
You will get the predicted gene models (
maker_case.gff
), protein sequences (maker_case.maker.proteins.fasta
) and transcript sequence (maker_case.maker.transcripts.fasta
) in the working directory.