Microarray classification

Few samples wrt gene number

Many filter approaches
problem of univariate
feature selection

overfit to train data
Infer data distribution
from train set

Reliability
Score useful?

gives more info
about data distribution
wrt classifier boundary
than ERROR RATE only

More useful for
small sample number
After that is it totally
related with error rate?

NEW
ELEMENT!

Classifier
Transparent

LDA

Simple

Robust

Interpretable

Any other, only needed dist from boundary

Score parameter

Selection how?

Include it in article?

Metagenes
are useful?

NEW
ELEMENT?

PROS

-Common behaviour

- Interpretable comb.

- Expanded feat space

-Resume of + genes

Is the improvement
enough to compensate
for tree construcion?

Treelet/
Euclidean ?

Is our algorhtm
better ?

MCC value
Mean value across
endpoints

YES for
MCC mean
vlue

MAQC II set

Contemporary

Many samples

MCC value comparable

Common
Ground

Computation
Time

Time scalability with features

SFFS : linear

IFFS : non linear and unfeasible

IFFS SW : linear x Wsize

Feature selection

99% of time

Tree construction time ?

Need to include that?