A “deep” look at Mira light-curves: Mira classification with CNN

Abstract

Mira variables are pulsating stars displaying periodic photometric and radial-velocity variations on timescales of order of months. They represent the final evolutionary stage of low- to intermediate-mass stars, and understanding their variability is critical to several applications, from constraining stellar models to characterizing the stellar-wind-mediated injection of nucleosynthesis products in the interstellar medium. Moreover, Mira variables follow a period-luminosity relation which is a promising distance indicator, possibly key to alleviate the Hubble tension.
Mira variables occur in two types, oxygen- and carbon-rich, depending on the relative abundance of these two elements at their surface. They have drastically different photometric and spectroscopic features, and any astrophysical application of these stars requires being able to accurately classify their chemical type. Customarily, this can either be done by spectroscopy, or by combining optical and near-infrared photometry. Both approaches have severe limitations, as the former can only be applied to limited samples of stars, and the latter relies on multi-epoch observations with wide spectral coverage, typically requiring multiple observational facilities.
Currently, a wealth of multi-epoch photometric observations of Miras exists at optical wavelengths, with data availability expected to increase exponentially due to ongoing and upcoming time-domain observational surveys. We explored the potential of using machine learning and deep learning algorithms to classify O- and C-rich Miras based on single-band photometric time series. Specifically, we considered several algorithms: Random Forest, XGBoost, and Support Vector Machine, applied to features extracted from Mira light curves using Fourier decomposition, as well as a Convolutional Neural Network (CNN) trained directly on 2D “images” generated from Mira light curves in the phase-magnitude plane. To the best of our knowledge, this is the first application of this method (typically used to classify photographs based on their content) to the characterization of highly-evolved pulsating stars.
Interestingly, the CNN method outperformed all the others. In particular, it demonstrated the ability to correctly classify cases that fall within regions of overlap when analyzed using traditional diagnostic plots based on Fourier decomposition features. We also conducted a preliminary interpretability analysis of the CNN results by identifying the parts of the light curves that most influence the classification process. These initial findings are especially promising, as they could help us uncover new features in light curves associated with the chemical composition of the stars. This is a promising example of how deep learning methods can not only enhance classification tasks on astrophysics datasets but also contribute to a deeper understanding of the underlying physics.

Results & Conclusions

• The CNN classification outperforms all other tested algorithms both in terms of accuracy and recall (>90% for the LMC test set, >90% for the SMC), especially in generalization to unseen data (e.g. trained on LMC and tested on SMC), where other methods perform no better than random classifiers.

• This higher generalization suggests that the CNN has learned actual patterns in the light curves rather than overfitting training data. For example, it is the only method able to correctly classify C-rich Miras with long periods and small amplitudes (Fig. 3).

• The presence of additional information in the light curves was not expected. Through interpretability analysis (Fig. 4), we aim to identify which features of the light curve encode this information and link them to physical properties.

This analysis shows how deep neural networks not only improve classification in astrophysical datasets but, despite being often considered black boxes, can also guide us toward additional physical insights and potentially lead to new discoveries.

**Fig. 3** - Period-Amplitude plane for the Miras in the Large (top) and Small (bottom) Magellanic Clouds. The blue and red dots show the stars correctly classified as O-rich or C-rich by the CNN. The green and and yellow circles highlight the misclassified O-rich and C-rich, respectively.

**Fig. 4** - Example of a folded Mira light curve color-coded by the neural activations in the third convolutional layer of the CNN. Each panel shows the output of a different filter.

Context

TP-AGB

Methods

Results & Conclusions

Heading…