Submodular coresets: Machine learning with less data
Feb 28, 2025Training machine learning models is expensive for a variety of reasons. Data movement costs immense time and power and backpropagation is...
A theoretical computer science problem that the SOTA LLMs get wrong (as of now)
Feb 21, 2025A P-splinter is a language where a PTIME function can enumerate its elements. Despite this being an old result, state-of-the-art LLMs get this problem wrong. (For now—by writing this, I am inadvertently helping the next generation of LLMs improve via memorization.)
CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming
Aug 1, 2024A type system for color programming that prevents physically meaningless computations and optimizes performance via equality saturation.
PLUTUS: Understanding Data Distribution Tailoring for Machine Learning
Jun 9, 2024A human-in-the-loop pipeline that identifies problematic model slices and acquires targeted data from external sources to improve fairness.

Data Distribution Tailoring
Jan 1, 2024Cost-efficient algorithms for collecting data from multiple sources while ensuring adequate representation across demographic groups.
The Two Coupons, Generic Quota Coupon Collector's Problem
Jul 7, 2023Suppose there are two distinct coupons. In each iteration, the probability of sampling type-1 coupon is $p$ and that for type-2 coupon is $q = 1 - p$. Our goal is to collect at least $k$ of type-1 and $r$ of type-2. How many iterations does it take, in expectation, to complete a full collection?

Fair $k$-Cover Coresets
Nov 30, 2022Efficiently obtain coresets that cover every point in the dataset while adequately representing groups of interest.

Single Exposure Fusion
Dec 15, 2021Extend the perceptual dynamic range of a single photograph using classic denoising and exposure fusion, no machine learning required.