Astronomy has entered the petabyte era, rendering traditional methods of discovery, where experts manually examine images, spectra, or light curves, unfeasible. Machine Learning is now routinely used for classification and characterization of celestial objects, but anomaly detection is still a challenge. The absence of ground truth makes it difficult to distinguish between physically interesting outliers and spurious data artifacts. While image-based anomaly detection is relatively well-developed, identifying novel variability in light curves remains a less-charted frontier.
Space-based missions like TESS and Kepler revealed that nearly 60% of stars exhibit variability when observed at millimagnitude precision. With the Legacy Survey of Space and Time (LSST) about to start monitoring billions of stars at similar or slightly worse precision, albeit with a sparcer cadence, and the Roman Space Telescope soon to follow, we need to learn how to detect meaningful novelties within vast, noisy, and heterogeneous datasets. Failing to do so may mean overlooking key insights into stellar evolution - or even entirely new physical phenomena.
We introduce MapLC, a project aimed at developing a semi-automated, model-agnostic pipeline for anomaly discovery in variable sources. Our approach combines three complementary strategies: a blind search for unanticipated behaviors, a reconstruction of known rare classes through unsupervised methods, and a targeted probe into regions of parameter space deemed promising by theoretical and statistical reasoning.