People/Web Search Calendar Emergency Info A-Z Index UVA Email University of Virginia

Computer Science Colloquia

Friday, May 11, 2012
Robbie Hott
Advisor: abhi shelat
Attending Faculty: Worthy Martin, chair; Gabe Robins, and Jason Lawrence

10:00 AM, Rice Hall, Rm. 504

PhD Quals Presentation
KD-Tree Algorithm for Propensity Score Matching


Objective. Propensity scores have been widely used in epidemiology to control for confounding bias in post-marketing studies, but the standard technique of matching patients by score becomes computationally impractical with studies of three or more treatment groups. We present a multi-category propensity score matching algorithm that is expected-case quadratic on the number of participants per group. Materials and Methods. Utilizing kd-tree data structures to provide efficient queries for nearby points and a search radius related to a best-guess match between participants in each treatment group, we reduce the number of participants that must be considered for each matching. We then match patients by propensity score, balancing the distribution of confounders in each treatment group and thereby removing bias. Discussion. We prove the correctness of this approach, showing that each search radius considered contains the same match brute force choses for each participant.

Results. Our algorithm outperforms the brute force approach in the expected case, requiring only O(n) space and O(kdn2) time compared with brute forces O(nk+1) time, for k treatment groups. This difference is clearly seen in our empirical study of 1000 participants in 3 groups: our algorithm matches in 3.5 seconds compared to brute forces 19.53 hours. Conclusion. In the vast majority of cases, we can accomplish matching on propensity score with three or more treatment groups without the constraint of exponential growth of the search space. Considering four groups of 5,000 patients, that is a reduction from 625 trillion matches to 100 million and orders of magnitude shorter computation time.