I work on a broad range of problems in Machine Learning and Data Mining,
particularly focusing on:
 mining large graphs and social networks,
 computational advertising,
 recommendation systems, and
 web search and information retrieval.
I
build statistical models that fully leverage the
problem structure by combining all sources of information, and
design
algorithms for inference and optimization that can be easily adapted
to distributed computation methods, such as Hadoop and Giraph.
More details may be found in my
resume
and my
Google Scholar profile.

Joint Inference of Multiple Label Types in Large Networks (ICML 2014)

How can we infer user profiles given only partially completed
profiles of a few users in a large social network? Instead of a
blanket assumption of homophily, we show that a more nuanced notion
of partial homophily achieves much greater accuracy, and
inference under this model can scale to the billionnode Facebook
network.

Speeding up LargeScale Learning with a Social Prior (KDD 2013)

How can social networks be used to learn users' responses to
recommended items? While "friends are similar" is a worthy guiding
principle, we show that online learning with the Gaussian Random
Field model typically used in such settings runs into severe
problems in realworld networks where degrees can be high. We
investigate the reasons behind this phenomenon, analyzing the model
in a Bayesian setting, and propose new models that are
provably better, while also admitting scalable inference.

Theoretical Justification of Popular Link Prediction Heuristics (COLT 2010 best student paper)

Why should a simple heuristic such as counting the number of common
neighbors work so well in link prediction tasks? Why is the
AdamicAdar measure better? We formulate this as a question of
measuring distances in a latent space, and prove how under fairly
general conditions, such common heuristics are correct and optimal
(and interestingly, why they sometimes deviate from optimality and
yet do not suffer in practice!)