V. faba CSFL RefTrans V1
Materials & Methods
CSFL Faba Bean RefTrans combines published RNA-Seq and EST data sets to create a reference transcriptome (RefTrans) for faba bean and provides putative gene function identified by homology to known proteins.
In V.faba_RefTrans_V1, 433 million RNA-Seq reads from publicly available peer-reviewed faba bean RNA-Seq data sets (Suresh et al. 2013, Ray et al. 2015, Webb et al. 2015, Zhang et al. 2015, Ocana et al. 2015, Arun-Chinnappa and McCurdy. 2015), and 20,697 ESTs, were downloaded from the NCBI Short Read Archive database (SRP043650, SRP038935, SRP033593, SRP033121, ERX837672, SRX690544, SRX912069) and the NCBI dbEST database, respectively. These RNA-Seq data sets include 400 million single-end reads and 33 million paired-end reads generated from 454 platforms and Illumina. The RNA-Seq sequences were subjected to quality control using the NGS QC Toolkit (V2.3.3, default parameters, Patel and Jain, 2012) and custom Perl scripts. The remaining 183 million RNA-Seq reads were assembled de novo with Trinity (v2.0.6, Grabherr et al, 2011) using default assembly parameters and a minimum coding length of 200 bases. Quality control of the ESTs included vector sequence screening (UniVec_Core,ftp://ftp.ncbi.nih.gov/pub/UniVec/) using cross_match (Gordon et al, 1998), removal of tRNA/rRNA/snRNA sequences identified using tblastx (Altschul et al, 1990), and Poly-A tail trimmimg. The filtered ESTs were assembled using the CAP3 program (P -90, Huan and Madan, 1999). Bowtie (v 2-2.2.3) (Langmead et al, 2009) was applied to multi-map the RNA-Seq reads and ESTs back to the assembled contigs and singlets. The contigs and singlets were hierarchically clustered into genes using Corset (v1.0.4) (Davidson and Oshlack, 2014) with default parameters. The longest isoform was selected to represent each Corset cluster, creating a RefTrans V1 for faba bean of 69,784 sequences. The RefTrans were functionally characterized by pairwise comparison using the BLASTX algorithm against the Swiss-Prot (UniProtKB/Swiss-Prot Release 2015_10) and TrEMBL (UniProtKB/TrEMBL Release 2015_10) (Boeckmann et al, 2003) protein databases. Information on the top 25 matches with an expect (E) value of ≤ 1E-06 were recorded and stored in the database. The transcriptome and annotation (GO Terms, match description, InterPro domains) are available for searching and downloading.
Additional information about this analysis: