Research

Submodular coresets: Machine learning with less data

Feb 28, 2025

Training machine learning models is expensive for a variety of reasons. Data movement costs immense time and power and backpropagation is...

A theoretical computer science problem that the SOTA LLMs get wrong (as of now)

Feb 21, 2025

A P-splinter is a language where a PTIME function can enumerate its elements. Despite this being an old result, state-of-the-art LLMs get this problem wrong. (For now—by writing this, I am inadvertently helping the next generation of LLMs improve via memorization.)

CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming

Aug 1, 2024

A type system for color programming that prevents physically meaningless computations and optimizes performance via equality saturation.

PLUTUS: Understanding Data Distribution Tailoring for Machine Learning

Jun 9, 2024

A human-in-the-loop pipeline that identifies problematic model slices and acquires targeted data from external sources to improve fairness.

The DT pipeline where data sources are combined with RatioColl or EpsilonGreedy to form a balanced unified dataset.

Data Distribution Tailoring

Jan 1, 2024

Cost-efficient algorithms for collecting data from multiple sources while ensuring adequate representation across demographic groups.

The Two Coupons, Generic Quota Coupon Collector's Problem

Jul 7, 2023

Suppose there are two distinct coupons. In each iteration, the probability of sampling type-1 coupon is $p$ and that for type-2 coupon is $q = 1 - p$. Our goal is to collect at least $k$ of type-1 and $r$ of type-2. How many iterations does it take, in expectation, to complete a full collection?

Red and blue points on a plane, with some red points and some blue points having a circle around them.

Fair $k$-Cover Coresets

Nov 30, 2022

Efficiently obtain coresets that cover every point in the dataset while adequately representing groups of interest.

The pipeline. An SDR image is made brighter and darker, then denoising is applied. The resultant three images are combined with Mertens' fusion.

Single Exposure Fusion

Dec 15, 2021

Extend the perceptual dynamic range of a single photograph using classic denoising and exposure fusion, no machine learning required.