On utilizing weak estimators to achieve the online classification of data streams
Journal article, Peer reviewed
MetadataShow full item record
Original versionTavasoli H, Oommen J, Yazidi A. On utilizing weak estimators to achieve the online classification of data streams. Engineering Applications of Artificial Intelligence. 2019;86:11-31 https://dx.doi.org/10.1016/j.engappai.2019.08.015
Classification, typically, deals with unique and distinct training and testing phases. This paper pioneers the concept when these phases are not so clearly well-defined. More specifically, we consider the case where the testing patterns can subsequently be considered as training patterns. The paradigm is further complicated because we assume that the class-conditional distributions of the features/classes are non-stationary, as in the case of most real-world applications. Specifically, we consider the model where the training phase is non-stationary and that it is, further, interleaved with the testing, and where it can be done online and in a real-time manner. We propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model with “counters” that maintain important data statistics, our online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is seen, without requiring that we have to rebuild the model when changes occur in the data distributions. Finally, and most importantly, the model operates with the understanding that the correct classes of previously- classified patterns become available at a later juncture subsequent to some time instances. This forces us to update the training set, the training model and the class conditional distributions as the testing proceeds. The results from rigorous empirical analysis on two-dimensional/multi-dimensional and binomial/multinomial distributions are remarkable. We also report some results on two real-life datasets adapted to this model of computation, demonstrating the advantages of the novel scheme for both binomial and multinomial non- stationary distributions.