On utilizing weak estimators to achieve the online classification of data streams
Journal article, Peer reviewed
Accepted version
Permanent lenke
https://hdl.handle.net/10642/8359Utgivelsesdato
2019-09-02Metadata
Vis full innførselSamlinger
Originalversjon
Tavasoli H, Oommen J, Yazidi A. On utilizing weak estimators to achieve the online classification of data streams. Engineering Applications of Artificial Intelligence. 2019;86:11-31 https://dx.doi.org/10.1016/j.engappai.2019.08.015Sammendrag
Classification, typically, deals with unique and distinct training and testing phases. This paper pioneers the
concept when these phases are not so clearly well-defined. More specifically, we consider the case where the testing
patterns can subsequently be considered as training patterns. The paradigm is further complicated because we
assume that the class-conditional distributions of the features/classes are non-stationary, as in the case of most
real-world applications. Specifically, we consider the model where the training phase is non-stationary and that
it is, further, interleaved with the testing, and where it can be done online and in a real-time manner.
We propose a novel online classifier for complex data streams which are generated from non-stationary
stochastic properties. Instead of using a single training model with “counters” that maintain important data
statistics, our online classifier scheme provides a real-time self-adjusting learning model. The learning model
utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each
time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new
element is seen, without requiring that we have to rebuild the model when changes occur in the data distributions.
Finally, and most importantly, the model operates with the understanding that the correct classes of previously-
classified patterns become available at a later juncture subsequent to some time instances. This forces us to
update the training set, the training model and the class conditional distributions as the testing proceeds.
The results from rigorous empirical analysis on two-dimensional/multi-dimensional and binomial/multinomial
distributions are remarkable. We also report some results on two real-life datasets adapted to this model
of computation, demonstrating the advantages of the novel scheme for both binomial and multinomial non-
stationary distributions.