Deepayan Chakrabarti

Associate Professor,
Information, Risk, and Operations Management (IROM)
McCombs School of Business,
UT Austin
Other affiliation: Amazon Scholar
Office: CBA 6.462, 2110 Speedway Stop B6500,
Austin, TX 78712
Email: deepay <at> utexas <dot> edu

I work on a broad range of problems in Machine Learning and Data Mining, with a recent focus on:

My work spans model-building, statistical inference, designing algorithms, and providing theoretical proofs of consistency.

More details may be found in my resume and my Google Scholar and DBLP profiles.

Estimating Mixed Memberships with Sharp Eigenvector Deviations (JASA, 2021)
People belong to multiple overlapping communities. Can we detect these communities from the structure of a social network? We show how one can consistently infer these communities, and the affinities that each person feels towards each community. These follow from a new sharp bound on eigenvectors, which may be of general interest.

Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio (Operations Research, 2019)
Portfolios are designed to optimally balance risk and reward. But this optimization can also amplify errors made in estimating stock market returns, hurting real-world performance. Instead of trying to independently fix the estimation or optimization steps, we disentangle the well-estimated aspects from the poorly estimated aspects of the data and treat them differently. The resulting portfolios are robust to noise, and perform well empirically.

Joint Label Inference in Networks (JMLR, 2017)
How can we infer user profiles given only partially completed profiles of a few users in a large social network? Instead of a blanket assumption of homophily, we show that a more nuanced notion of partial homophily achieves much greater accuracy, and inference under this model can scale to the billion-node Facebook network.

Theoretical Justification of Popular Link Prediction Heuristics (COLT 2010 best student paper)
Why should a simple heuristic such as counting the number of common neighbors work so well in link prediction tasks? Why is the Adamic-Adar measure better? We formulate this as a question of measuring distances in a latent space, and prove how under fairly general conditions, such common heuristics are correct and optimal (and interestingly, why they sometimes deviate from optimality and yet do not suffer in practice!)