I work on a broad range of problems in Machine Learning and Data Mining,
with a recent focus on:
- mining large graphs and social networks,
- robust optimization, and
- computational advertising.
My work spans model-building, statistical inference, designing algorithms, and providing theoretical proofs of consistency.
More details may be found in my
resume
and my
Google Scholar profile.
-
Joint Label Inference in Networks (JMLR 2017)
-
How can we infer user profiles given only partially completed
profiles of a few users in a large social network? Instead of a
blanket assumption of homophily, we show that a more nuanced notion
of partial homophily achieves much greater accuracy, and
inference under this model can scale to the billion-node Facebook
network.
-
Speeding up Large-Scale Learning with a Social Prior (KDD 2013)
-
How can social networks be used to learn users' responses to
recommended items? While "friends are similar" is a worthy guiding
principle, we show that online learning with the Gaussian Random
Field model typically used in such settings runs into severe
problems in real-world networks where degrees can be high. We
investigate the reasons behind this phenomenon, analyzing the model
in a Bayesian setting, and propose new models that are
provably better, while also admitting scalable inference.
-
Theoretical Justification of Popular Link Prediction Heuristics (COLT 2010 best student paper)
-
Why should a simple heuristic such as counting the number of common
neighbors work so well in link prediction tasks? Why is the
Adamic-Adar measure better? We formulate this as a question of
measuring distances in a latent space, and prove how under fairly
general conditions, such common heuristics are correct and optimal
(and interestingly, why they sometimes deviate from optimality and
yet do not suffer in practice!)