Genna
Uniqueness
Ability to use Censored
Observations
Most Data Mining techniques tend to ignore the concept
of censored observations,
assuming that the observed is the time of occurrence
of the event. While this approach may be convenient,
it leads to strong biases within the model, as the true
distribution of the predicted field could be very different
from the observed distribution. GENNA uniquely provides
distance metrics and prediction mechanisms to explicitly
handle censored observations by combining elements of
evidence theory into the prediction process and well
established statistical techniques like Kaplan-Meier
and Wilcoxon’s test.
Ability to use Categorical and Numeric Attributes
through the use of innovative distance metrics
Generally, nearest neighbour algorithms use similarity
metrics that are either more suited to categorical attributes
or numeric attributes. Using both these types of attributes
together introduce biases within the
Ability to (semi-) automatically optimise the similarity
metric used for comparable retrieval. GENNA uses innovative
similarity metrics that are suitable for use by numeric
as well as categorical attributes.
Automatic Indexing of data for Scalability
and Speed
One of the shortfalls of the nearest neighbour family
of algorithms is that as they do not build “compact”
models from data for use in predictions, as the data
volume increases, the speed of the prediction process
can suffer. To alleviate this problem, GENNA automatically
indexes the data using clustering techniques to speed
up the prediction process.
Incremental Learning and Introspection
Once a model is built using data mining, an important
part of the deployment is the monitoring of the accuracy
of the predictions made by the model. Over a period
of time, the context of the application of the model
changes, a concept referred to as Concept Drift in Machine
Learning literature. With this shift in context the
model becomes less accurate in its application. Most
data mining algorithms would need to be reapplied to
new data resulting in a new model being built and applied
within the new context.
GENNA approaches this problem differently, as new data
is collected, whether the data represents new observations
or feedback from the application of the model, it is
incorporated into the current model. If the data is
actually new observations this continuous learning is
referred to as Incremental learning. The incorporation
of data on the accuracy of the model’s application
on the other hand is referred to as Introspection.
back to products
|