Microarray classification
Few samples wrt gene number
Many filter approaches
problem of univariate
feature selection
overfit to train data
Infer data distribution
from train set
Reliability
Score useful?
gives more info
about data distribution
wrt classifier boundary
than ERROR RATE only
More useful for
small sample number
After that is it totally
related with error rate?
NEW
ELEMENT!
Classifier
Transparent
LDA
Simple
Robust
Interpretable
Any other, only needed dist from boundary
Score parameter
Selection how?
Include it in article?
Metagenes
are useful?
NEW
ELEMENT?
PROS
-Common behaviour
- Interpretable comb.
- Expanded feat space
-Resume of + genes
Is the improvement
enough to compensate
for tree construcion?
Treelet/
Euclidean ?
Is our algorhtm
better ?
MCC value
Mean value across
endpoints
YES for
MCC mean
vlue
MAQC II set
Contemporary
Many samples
MCC value comparable
Common
Ground
Computation
Time
Time scalability with features
SFFS : linear
IFFS : non linear and unfeasible
IFFS SW : linear x Wsize
Feature selection
99% of time
Tree construction time ?
Need to include that?