Harvard University Herbaria

Phylogenetic analysis

: In connection with our monographic studies, data matrices will be assembled for numerical phylogenetic analyses. In the case of morphological characters great care will be taken in making initial homology assessments based on positional, structural, and developmental evidence (Patterson, 1982; Donoghue, 1992). It should be possible to take full advantage of the complexity of morphological characters (Donoghue and Sanderson, 1994), as the primary data will be gathered by members of the research group, rather than culled from previously published studies. Homology assessments are, of course, equally important in molecular datasets, and great attention will be devoted to alignment procedures and the analysis of alternative alignments. All phylogenetic data will be entered and stored using the flexible data editor in MacClade (Maddison and Maddison, 1992), with annotations and digital images associated with individual taxa, characters, and entries in the matrix. These matrices, and associated information, will be made available over the internet (see below).

Phylogenetic analyses will be carried out using parsimony (see Farris, 1983; Huelsenbeck and Hillis, 1993), as implimented in programs such as PAUP (Swofford, 1993), and studies of character evolution will make use of MacClade (Maddison and Maddison, 1992) and related tools. Maximum likelihood methods will also be utilized in the case of molecular datasets (e.g., Hibbett and Donoghue, in review), and we will explore the influence of character and character-state weighting schemes (e.g., Albert and Mishler, Goloboff, 1994; Wheeler, 1990), including a priori approaches based, for example, on transition/transversion ratios, and iterative approaches such as succesive approximations weighting (Farris, 1969; Carpenter, 1988). The main intention of such weighting studies is to explore the sensitivity of the results to a variety of assumptions, in the hope of identifying especially robust conclusions, which hold over a wide range of circumstances.

Analyses involving a relatively small number of taxa will be conducted using exhaustive or branch-and-bound searches, whereas larger matrices will be analyzed using heuristic search strategies (Swofford and Olsen, 1990; Maddison et al., 1992; Olmstead et al, 1993) . In general, such strategies involve multiple searches from different starting topologies, which increases the chance of locating alternative tree islands (Maddison, 1991) . The robustness of phylogenetic results will be explored in several ways, including bootstrap analysis (Felsenstein, 1985; Hillis and Bull, 1993; Sanderson, 1989) , "decay" or "Bremer support" analysis (Bremer, 1988; Donoghue et al., 1992) , and various permutation tests (such as the T-PTP test described by Faith, 1991; Faith and Cranston, 1991; but see Kallersjo et al., 1992). Although none of these measures is perhaps ideal, it is hoped that through a combination of approaches we will be able to provide an indication of the level of support for particular hypothesized relationships. Such testing is highly desirable when phylogenetic results are used as a basis for formal taxonomic changes, or in studies of character evolution or biogeography.

Comparing the results of separate analyses of data from different sources may be helpful in identifying areas of conflict (Hillis, 1987) . Strongly supported conflicts may indicate non-independence of sets of characters within one or more of the datasets (Shaffer et al., 1991) or different underlying histories for different genes (Doyle, 1992) , resulting perhaps from hybridization (Rieseberg and Brunsfeld, 1992; Rieseberg and Soltis, 1991) , lineage sorting (Maddison, 1995; Pamilo and Nei, 1988) , or lateral transfer (de Queiroz, 1993) . In the face of such conflict, or significant heterogeneity in rates of evolution (tested, e.g., using the technique of Rodrigo et al., 1994), combined analyses should be interpreted cautiously, and characters should perhaps be weighted to reflect their likelihood of change (Barrett et al., 1991; Bull et al., 1993; Chippendale and Wiens, 1994; Huelsenbeck et al., 1994) or non- independence (Doyle, 1992) . If strong conflicts are identified, involving the position of one or a few taxa, experiments can be conducted in which suspect taxa are removed and the data are then combined for the remaining taxa (de Queiroz, Donoghue, and Kim, in review).

Combined analyses allow characters from different datasets to interact with one another in estimating phylogenetic relationships, which may reveal complimentary signals present in different datasets that may not have been seen in separate analyses (Barrett et al., 1991; Chippendale and Wiens, 1994) . This kind of complimentarity has been observed in a number of combined analyses; e.g., cpDNA datasets in Solanaceae (Olmsetad and Sweere, 1994) , and rDNA and morphology in angiosperms (Doyle et al., 1994) . When datasets appear to be more or less homogenous or to show only weak conflicts the combined analysis should provide the best estimate of the phylogeny (Miyamoto, 1985; Kluge, 1989; Barrett et al., 1991; Larson, 1994);, and it is this estimate that is best used in studies of character evolution, etc. (Donoghue, 1989; Maddison, 1990; Brooks and McLennan, 1991; Harvey and Pagel, 1991) .