I work on a broad range of problems in Machine Learning and Data Mining,
with a recent focus on:
- analyzing large graphs and social networks,
- robust optimization, and
- machine learning.
My work spans model-building, statistical inference, designing algorithms, and providing theoretical proofs of consistency.
More details may be found in my
Google Scholar profile.
I am looking for new Ph.D. students. Email me your CV if you are interested.
Estimating Mixed Memberships with Sharp Eigenvector Deviations (JASA, 2021)
People belong to multiple overlapping communities.
Can we detect these communities from the structure of a social network?
We show how one can consistently infer these communities, and the affinities that each person feels towards each community.
These follow from a new sharp bound on eigenvectors, which may be of general interest.
Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio (Operations Research, 2019)
Portfolios are designed to optimally balance risk and reward.
But this optimization can also amplify errors made in estimating stock market returns, hurting real-world performance.
Instead of trying to independently fix the estimation or optimization steps, we disentangle the well-estimated aspects from the poorly estimated aspects of the data and treat them differently.
The resulting portfolios are robust to noise, and perform well empirically.
Joint Label Inference in Networks (JMLR, 2017)
How can we infer user profiles given only partially completed
profiles of a few users in a large social network? Instead of a
blanket assumption of homophily, we show that a more nuanced notion
of partial homophily achieves much greater accuracy, and
inference under this model can scale to the billion-node Facebook
Theoretical Justification of Popular Link Prediction Heuristics (COLT 2010 best student paper)
Why should a simple heuristic such as counting the number of common
neighbors work so well in link prediction tasks? Why is the
Adamic-Adar measure better? We formulate this as a question of
measuring distances in a latent space, and prove how under fairly
general conditions, such common heuristics are correct and optimal
(and interestingly, why they sometimes deviate from optimality and
yet do not suffer in practice!)