PGS Meeting January 18, 2012 Minutes

Minutes – Pea genome sequencing discussion
Plant and Animal Genome conference, San Diego, 18 January 2012
 
Participants: Clare Coyne, Kevin McPhee, Crystal Chan, Jaroslav Dolezel, Tom Warkentin (recorder)
 
Note: A preliminary discussion on this topic was held on 14 January 2012 at the same conference with the following participants: Keithanne Mockaitis, Dorrie Main, Clare Coyne, Rebecca McGee, Michael Mazurik, Laura Marek, Kirstin Bett, Kevin McPhee, Crystal Chan, Andrew Sharpe, and Tom Warkentin. 
During the 18 January discussion, we asked Jaroslav Dolezel to outline his suggestions for a pea genome sequencing strategy . Following is a summary of his suggestions. The approach would proceed in steps such that at the end of each step, a useful set of data would be produced for use by the community.

Step 1.1 Obtain a survey sequence for pea genome
Conduct survey (‘shotgun’) sequencing based on the method described by Mayer et al 2011 (The Plant Cell 23: 1249-1263). Next generation technologies, likely from Illumina, would be used for sequencing. Reads would be assembled and repetitive DNA would be masked to leave single and low copy sequences for further analysis. This will identify most of the genes in pea.   Approximate cost: $20,000 US.
Comment Jaroslav Dolezel: This can be done on the whole genome level. However, the use of flow-sorted chromosomes would result in assigning genes to chromosomes. N.B.: Not all chromosomes of pea can be flow sorted – some if them only in groups. Indeed, this estimate includes only the cost of DNA preparation and sequencing (no staff, no overheads, no data processing).
Comment Noel Ellis: I think this step needs some heavy bioinformatics. The assemblies can be related to extant transcriptome sequence, and I think at this stage we can use the information for high densty mapping in a large RIL population.>

Step 1.2 Synteny analysis/Genome zipper
Search for synteny between the low copy sequences from step 1.1 and that of sequenced legume genomes (Medicago truncatula, Lotus japonicus, Glycine max, and perhaps Cicer arietinum). As these legumes have already been sequenced and annotated, step 1.2 would provide a good estimate of pea gene order and function. This process should identify 85-95% of the genes in pea, perhaps 3000-4000 per chromosome.
This step could be conducted by ........................ at an approximate cost of ...........................
Comment Crystal Chan: A good bioinformatican. Either Klaus Mayer in Munich or Andy Flavell would be able to provide a recommendation on that.
Comment Noel Ellis: <At Aberystwyth there are some people with this type of expertise Martin Swain (http://www.aber.ac.uk/en/ibers/staff/staff-list/mts11/) was at EBI and involved in the tetse fly genome project and Denis Larkin (http://lewinlab.igb.uiuc.edu/Personnel/DenisLarkin.html his web site here is not up yet) who is interested in mammalian genome evolution – I could try to talk them into helping.>

Step 2 Develop SNP markers
Use exome capture from 20-50 pea genotypes, including the genotype used in step 1.1, to identify SNPs.
This step could be conducted by ........................ at an approximate cost: $50,000 US.
Comment Noel Ellis: Good plan, needs careful choce of genotypes>

Step 2.1 Re-sequencing
Resequence another pea genome against the genome from step 1.1 to identify SNPs. This step could be conducted by ........................ at an approximate cost of ...........................
Comment Jaroslav Dolezel: Exome sequencing and re-sequencing will, in this context, provide similar data (SNPs). However, re-sequencing is straightforward and the only bottleneck is the bioinformatics – the genome is huge.
On the other hand, exome sequecing will be cheap and the bioinformatics easier. However, it will require development of a chip/array for gene capture (will cost). Not sure if this tool is available for other legumes? Not sure how useful it would be for pea.
Comment Noel Ellis: we should sequence the two for which there are bac libraries.

Step 3. Sequence the genome (‘Gold standard’)

Step 3.1. Physical map
Utilize a BAC library for pea DNA sequencing. Two BAC libraries exist for pea: one at USDA (Clare Coyne) with 3X coverage, based on a wild accession; the second at INRA (Abdel Bendahmane) with 10X coverage, based on cultivar Cameor. A new BAC library with 10X coverage could be developed if desired from a new cultivar at a cost of approximately $100,000 US.
For pea this would include some 500,000 clones. Procedure includes capillary sequencing and assembly into contigs. Contigs would be selected to produce the minimum tiling path (perhaps 20,000 clones for pea). The minimum tiling path contigs would be sequenced and the sequence would be ordered to produce the pea genome sequence.
Comment Crystal Chan: Should put a note that whichever genotype we have picked for this work – that genotype will become the reference sequence. So we have to choose carefully. BAC-end sequencing then finger-printing?
Comment Jaroslav Dolezel: An economical approach has been to fingerprint BAC library, establish MTP (minimum tiling path) and sequence ends of BAC clones from the MTP.
However, you may also consider Whole Genome Profiling (Keygene), which avoids fingerprinting and provides sequence tags (help to assembly MTP and later DNA sequence).
N.B.: Keygene works in partnership with AmpliconExpress, who may construct BAC library for a resonable cost.
N.B.: Whole Genome Profiling is more expensive than fingerprinting, but you would be free of any experimental hassles.
 
 
Comment Noel Ellis: Claire for info :
From: bendahmane [mailto:bendahm@evry.inra.fr]
Sent: 18 January 2012 18:09
To: Crystal Chan
Cc: Noel Ellis [noe2]; coynec@wsu.edu; tom.warkentin@usask.ca; Kevin.McPhee@ndsu.edu; dolezel@ueb.cas.cz
Subject: Re: Pea genome SEQUENCING - questions about BAC libraries
 
Dear colleague,
the pea BAC library consist of more then 400 000 clones of at least
100 kb insert avarage. It was made thanks to the GLIP and Noel. So I m happy to share it with the scientific community.
my best wishes,
Abdel,
 
This step could be conducted by ........................ at an approximate cost of .....................

Step 3.2. Anchoring the physical map
Establish genetic mapping population(s) - the parents of the population(s) should be included in the step 2. The population(s) should be large (several thousand individuals) to enable high-resolution mapping. It ma be wise to use cvs. with agronomicaly important traits as parents.
Comment Noel Ellis:  I think this is an earlier task. I think that the anaysis of agronomic traits is important but it s a different issue from the sequence, gene content, and gene order. I think we want to maximize the level op polymorphism in the cross, but agreed one parent should be a cultivar>
 
Create high-resolution SNP map – SNP chip/array may be needed - it should be developed as part of the Step 2.
Comment Jaroslav Dolezel: May cost! But the chip may be a usefu tool for the community. Illumina offes some deals if the chip is ordered in bulk (e.g. for a consortium). You may also consider genotyping by sequencing. However, this is expensive and I am not sure if the approach is mature enough.
Comment Noel Ellis: I’m not convinced about that it will be costly to use and will it have enough users to make it cost effective?

Step 3.3. Sequencing
The strategy how to sequence BAC clones from MTP (individual BACs vs. BAC pools, sequencing platforms) will be decided considering the technology available at the time the MTP is ready. The sequencing will provide additional information to verify the order of BACs in the MTP.
Comment Noel Ellis:  I think the details of the strategy will depend a lot on the technology used, so I think we need to say what we want rather than how to get it.