People/Web Search Calendar Emergency Info A-Z Index UVA Email University of Virginia

Computer Science Colloquia

Monday, December 21, 2015
Liliya Besaleva
Advisor: Alf Weaver
Attending Faculty: Worthy Martin (Chair); Jack Stankovic, Hongning Wang and Larry Richards (MAE).

1:00 PM, Rice Hall, Rm. 242

PhD Proposal Presentation
Smart e-Commerce Personalization Using Customized Algorithms


Applications for machine learning algorithms can be observed in numerous places in our modern lives. From medical diagnosis predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-universal nature of the generated data. Within the vast domain of e-commerce we are proposing a new approach for handling imbalanced data, which is a hybrid classifier that will consist of a mixed solution of both adaptable data format and algorithmic modifications. Our solution will be divided into two main phases serving different purposes. In phase one, we will classify the outliers with less accuracy for faster, more urgent situations, which require immediate predictions that can withstand possible errors in the classification. In phase two, we will do a deeper analysis of the results and aim at precisely identifying high-cost imbalanced data with larger impact. The goal of this work is to provide a solution that improves the data usability, classification accuracy and resulting costs of analyzing massive data sets in e-commerce.