Yan Wan1, Shira L. Broschat1,2, and Douglas R. Call2
1School of Electrical Engineering and Computer Science,2 Department of Veterinary Microbiology and Pathology, Washington State University, Pullman,WA 99164
Comparative genomic hybridizations have been used to examine genetic relationships between bacteria. The microarrays used in these experiments may have open reading frames from one or more reference strains, or they may be composed of random DNA fragments from a large number of strains (mixed-genome microarrays; MGM). Herein both experimental and virtual arrays are analyzed to assess the validity of genetic inferences from these experiments with a focus on MGMs. Empirical data is analyzed from an Enterococcus MGM while a virtual MGM is constructedin silico using sequenced genomes (Streptococcus). On average a small MGM is capable of correctly deriving phylogenetic relationships between seven species of Enterococcus with 100% (n=100 probes) and 95% (n=46 probes) accuracy; more probes are required for intra-specific differentiation. Compared to multilocus sequence methods and whole-genome microarrays, MGM provides additional discrimination between closely related strains and offers the possibility of identifying unique strain or lineage markers. Representational bias can have mixed effects. Microarrays composed of probes from a single genome can be used to derive phylogenetic relationships, although branch length can be exaggerated for the reference strain. We describe a case where disproportional representation of different strains used to construct an MGM can result in inaccurate phylogenetic inferences and we illustrate an algorithm that is capable of correcting for this type of bias. This bias-correction algorithm automatically provides bootstrap confidence values, and can provide multiple bias-corrected trees with a high confidence values.