On utilizing weak estimators to achieve the online classification of data streams

Tavasoli, Hanane; Oommen, John; Yazidi, Anis

Tavasoli, Hanane; Oommen, John; Yazidi, Anis

Journal article, Peer reviewed

Accepted version

Åpne

OnlineSLWEJnlFin04.pdf (1.764Mb)

Permanent lenke

https://hdl.handle.net/10642/8359

Utgivelsesdato

2019-09-02

Metadata

Vis full innførsel

Samlinger

TKD - Institutt for informasjonsteknologi [945]

Originalversjon

Tavasoli H, Oommen J, Yazidi A. On utilizing weak estimators to achieve the online classification of data streams. Engineering Applications of Artificial Intelligence. 2019;86:11-31 https://dx.doi.org/10.1016/j.engappai.2019.08.015

Sammendrag

Classification, typically, deals with unique and distinct training and testing phases. This paper pioneers the

concept when these phases are not so clearly well-defined. More specifically, we consider the case where the testing

patterns can subsequently be considered as training patterns. The paradigm is further complicated because we

assume that the class-conditional distributions of the features/classes are non-stationary, as in the case of most

real-world applications. Specifically, we consider the model where the training phase is non-stationary and that

it is, further, interleaved with the testing, and where it can be done online and in a real-time manner.

We propose a novel online classifier for complex data streams which are generated from non-stationary

stochastic properties. Instead of using a single training model with “counters” that maintain important data

statistics, our online classifier scheme provides a real-time self-adjusting learning model. The learning model

utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each

time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new

element is seen, without requiring that we have to rebuild the model when changes occur in the data distributions.

Finally, and most importantly, the model operates with the understanding that the correct classes of previously-

classified patterns become available at a later juncture subsequent to some time instances. This forces us to

update the training set, the training model and the class conditional distributions as the testing proceeds.

The results from rigorous empirical analysis on two-dimensional/multi-dimensional and binomial/multinomial

distributions are remarkable. We also report some results on two real-life datasets adapted to this model

of computation, demonstrating the advantages of the novel scheme for both binomial and multinomial non-

stationary distributions.

Utgiver

Elsevier

Serie

Engineering Applications of Artificial Intelligence;Volume 86, November 2019

Tidsskrift

Engineering Applications of Artificial Intelligence