Microarray data analysis

available pathway database

KEGG: http://www.genome.jp/kegg/pathway.html
REACTOME: http://www.reactome.org/
INOH: http://www.inoh.org/
PANTHER: http://www.pantherdb.org/
MSigDB: http://www.broad.mit.edu/gsea/msigdb/msigdb_index.html

cross chromosomes transcription activation

in the phenomenon of transvection, chromosome pairing allows an active enhancer on one chromosome to activate transcription from an allelic promoter on the other chromosome.

1. PMID: 9679020

2. PMID: 8425891

Important gene expression data source

GEO
ArrayExpress
GNF (for human and mouse. Related paper: 15075390, 11904358)
11 different human adultand fetal tissues [11015593]
19 normal and diseased human tissue by SAGE [10581018]
compendium of expression profile of human (19 tissues) [11773596]
compendium of expression profiles of yeast[10929718]
SMD (for Yeast)

How does gene expression clustering work? - Nature Biotechnology

Transcriptional response of Saccharomyces cerevisiae to DNA-damaging agents does not identify the genes that protect against these agents

PMID: 12077312
The data in this paper imply that despite the extensive changes in gene expression in response to the different DNA-damaging agents,few if any of these changes were necessary to protect the viabilityof the cell against these agents.

How to filter genes before

Origin post: https://stat.ethz.ch/pipermail/bioconductor/2005-October/010454.html
---------------------------------------------------------------------------------
Hi,

Teresa Casals wrote:
> Hello
>
> I have been involved recently in analyzing some
> microarray experiments performed with affymetrix
> chips.
>
> This task had been previously done by another analyst
> who left me some scripts, but no explanations.
>
> The procedure she used to follow was first to
> normalize the arrays, say using rma and then, before
> doing any tests she used to apply two filters
>
> - She kept only those genes whose signal was greater
> than a threshold on all arrays (she used "log(100)" as
> this threshold)
> - Assuming for simplicity that there were only two
> groups she applied a second filter keeping only those
> genes where the base-2 logarithm of the difference
> between the mean of the two groups was greater than
> 1.5
>
> I think I understand the rationale under this
> procedure, but also I find it somewhat arbitrary.
>

Pretty much all filtering of genes is arbitrary. I don't think that
there is a way out of that, unless you know a lot about the underlying
biology. Some reduction of the genes that were assayed is necessary so
you must choose some method.

We have found that it is better to filter on variability rather than
level (although at one time I was a fan of filtering on level). Choose
some (arbitrary amount of variability) and filter out those genes which
do not show that amount of variability across samples.

You can see the second paper at
http://www.bepress.com/bioconductor/
for some more detailed discussions of the issues.

Robert

> Could someone please orient me about if this a
> usual/right way to proceed, or address to some
> references or examples which help to diminish the
> feeling of arbitrarity?
>
> Thanks for your help
>
> ========================
> Teresa Casals
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>

--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org

How to choose the measure of dissimilarity between the expression patterns of two genes?

As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. The compared distance measures are: Euclidean, 3-norm, Manhattanor city-block, Hausdorff, and Pearson correlation.

See PMID: 12368250

This paper also concluded that clustersof genes derived from single- and average-linkage hierarchicalclustering tend to produce worse-than-randomresults.

Neighborhood Analysis

Neighborhood analysis is very simple and pretty powerful.
The neighborhood index is defined as the number of genes that lie within the query gene’s R
nearest expression neighbors. For a given query, G, all genes represented in the expression set are first ordered according to the Euclidean distance (or other appropriate distance) of their normalized expression pattern across all experimental conditions.

See PMID: 12529507

Microarray data analysis

Tuesday, March 20, 2007

available pathway database

Sunday, December 03, 2006

cross chromosomes transcription activation

Friday, February 17, 2006

Important gene expression data source

Sunday, December 11, 2005

How does gene expression clustering work? - Nature Biotechnology

Wednesday, October 12, 2005

Transcriptional response of Saccharomyces cerevisiae to DNA-damaging agents does not identify the genes that protect against these agents

Monday, October 10, 2005

How to filter genes before

Friday, September 02, 2005

How to choose the measure of dissimilarity between the expression patterns of two genes?

Neighborhood Analysis

About Me

Links

Previous Posts

Archives