Friday, September 02, 2005

How to choose the measure of dissimilarity between the expression patterns of two genes?

As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. The compared distance measures are: Euclidean, 3-norm, Manhattan or city-block, Hausdorff, and Pearson correlation.

See PMID: 12368250

This paper also concluded that clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.

Neighborhood Analysis

Neighborhood analysis is very simple and pretty powerful.
The neighborhood index is defined as the number of genes that lie within the query gene’s R
nearest expression neighbors. For a given query, G, all genes represented in the expression set are first ordered according to the Euclidean distance (or other appropriate distance) of their normalized expression pattern across all experimental conditions.

See PMID: 12529507