Differential Analysis/Marker Selection

November 12, 2012

Differential Analysis/Marker Selection

Gene expression analysis has four wide-ranging categories. These are differential analysis/marker selection, class prediction (supervised learning), class discovery (unsupervised learning) and pathway analysis.

Differential analysis is also known as marker selection. It searches for genes that are differentially expressed in distinct phenotypes and rates the genes on the statistic value that is used to make an assessment of differential expression. This is sometimes called gene expression. Permutation testing can also be done to calculate the nominal p-value of the rank each gene has been given.

Many genes are more than likely to have considerable p-values based on chance because of the amount of genes that are tested against an untrue hypothesis.

Analysis can be changed for a multiple hypothesis test by using statistical methods. This includes FDR (false discovery rate) and FWER (family wise error rate).

Class prediction or supervised learning involves searching for a gene expression signature that is able to identify a class (phenotype) connection. The method used in class prediction begins with two sets of data, test and training set. The training set would be used to build a class predictor that has been chosen as the method of classification. The test data is used to test the class predictor.

Class prediction is based on various methods of classifications. These include:

  • CART (classification and regression trees)

  • PNN (probabilistic neural network)

  • KNN (K-nearest neighbours)

  • SVM (support vector machines)

  • Weighted voting

The methods of classification have been used in research that was carried out by scientists from the Broad Institute.

The method of classification helps class predication by leave-one-out cross-validation. For data sets that are smaller, the cross-validation divides the set of data into n-folds. Analysis would train on n-1 folds and testing would be carried out on the other fold. Once all testing and training has been done a combination of the analysis and results determine the classifier. This method is used to separate sets of single data which means there is no overlap on the training and test data sets.

Class discovery or unsupervised learning involves searching for a biologically applicable unidentified classification that can be recognized by a gene expression signature or a biologically applicable set of genes that are co-expressed.

The method used for class discovery is clustering. To find a gene expression signature the data is clustered based on the method chosen. Validation of the clusters is carried out by enrichment analysis to find out if the clusters are developed from categories, processes or pathways that play an important role or the results are replicated in other sets of data.

Pathway analysis is a process that involves searching for sets of genes which are expressed in separate phenotypes. Kolmogorov-Smirnov (KS) is a non-parametric rank statistic that is characterized by the position of the genes in a well structured list. This analysis can be used to carry out an examination on the enrichment of the genes. The KS score will be high for gene sets that appear at the top end of the ordered list.

Category: Articles