Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing

Publication Overview
TitleComprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing
AuthorsFranssen SU, Shrestha RP, Bräutigam A, Bornberg-Bauer E, Weber AP
TypeJournal Article
Journal NameBMC genomics
Volume12
Year2011
Page(s)227
CitationFranssen SU, Shrestha RP, Bräutigam A, Bornberg-Bauer E, Weber AP. Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing. BMC genomics. 2011; 12:227.

Abstract

BACKGROUND
The garden pea, Pisum sativum, is among the best-investigated legume plants and of significant agro-commercial relevance. Pisum sativum has a large and complex genome and accordingly few comprehensive genomic resources exist.

RESULTS
We analyzed the pea transcriptome at the highest possible amount of accuracy by current technology. We used next generation sequencing with the Roche/454 platform and evaluated and compared a variety of approaches, including diverse tissue libraries, normalization, alternative sequencing technologies, saturation estimation and diverse assembly strategies. We generated libraries from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings, comprising a total of 450 megabases. Libraries were assembled into 324,428 unigenes in a first pass assembly.A second pass assembly reduced the amount to 81,449 unigenes but caused a significant number of chimeras. Analyses of the assemblies identified the assembly step as a major possibility for improvement. By recording frequencies of Arabidopsis orthologs hit by randomly drawn reads and fitting parameters of the saturation curve we concluded that sequencing was exhaustive. For leaf libraries we found normalization allows partial recovery of expression strength aside the desired effect of increased coverage. Based on theoretical and biological considerations we concluded that the sequence reads in the database tagged the vast majority of transcripts in the aerial tissues. A pathway representation analysis showed the merits of sampling multiple aerial tissues to increase the number of tagged genes. All results have been made available as a fully annotated database in fasta format.

CONCLUSIONS
We conclude that the approach taken resulted in a high quality - dataset which serves well as a first comprehensive reference set for the model legume pea. We suggest future deep sequencing transcriptome projects of species lacking a genomics backbone will need to concentrate mainly on resolving the issues of redundancy and paralogy during transcriptome assembly.

Features
This publication contains information about 84,267 features:
Feature NameUniquenameType
JI981123JI981123.1region
JI981122JI981122.1region
JI981121JI981121.1region
JI981120JI981120.1region
JI981119JI981119.1region
JI981118JI981118.1region
JI981117JI981117.1region
JI981116JI981116.1region
JI981115JI981115.1region
JI981114JI981114.1region
JI981113JI981113.1region
JI981112JI981112.1region
JI981111JI981111.1region
JI981110JI981110.1region
JI981109JI981109.1region
JI981108JI981108.1region
JI981107JI981107.1region
JI981106JI981106.1region
JI981105JI981105.1region
JI981104JI981104.1region
JI981103JI981103.1region
JI981102JI981102.1region
JI981101JI981101.1region
JI981100JI981100.1region
JI981099JI981099.1region

Pages

Properties
Additional details for this publication include:
Property NameValue
Journal CountryEngland
Publication ModelElectronic
ISSN1471-2164
eISSN1471-2164
Publication Date2011
Journal AbbreviationBMC Genomics
DOI10.1186/1471-2164-12-227
Elocation10.1186/1471-2164-12-227
LanguageEnglish
Language Abbreng
Publication TypeJournal Article
Publication TypeResearch Support, Non-U.S. Gov't
Cross References
This publication is also available in the following databases:
DatabaseAccession
PMID: PubMedPMID:21569327