Research projects within the AstroAI-Lab
							The AstroAI-Lab is a recently funded interdisciplinary research group and currently consist of three postdocs, two PhD students and several Bachelor and Master students that all work on the intersection of physics, astronomy, data science / informatics and machine learning. Below we list currently ongoign projects.
Bachelor and master student projects
You are a student and looking for some interesting thesis projects in astrophysics and/or machine learning? Our group has plenty open projects in the broad field of galaxy formation, physics-informed ML and simulation-based inference. Do not hesitate to contact us if you like to know more about the projects or the group.
People and their projects
Nicklas: Magento-hydrodynamic simulations of binary stars
									In this project, magnetohydrodynamic (MHD) simulations with the differentiable solver jf1uids are used to study how binary star systems and small star clusters, embedded in the interstellar medium, interact with their surroundings under the influence of stellar winds, cosmic rays and external magnetic fields. The adjacent figure shows the density slice in the xy plane of a colliding wind binary, with an asymmetric bow shock and a dense wind interaction region. An additional goal is to use gradient-based optimization or similar methods to recover physical parameters from unknown simulations or real observed data.
								
Luca: Lagrangian neural ODEs
									Neural ODEs are a widely used, powerful machine learning technique in particular for physics. However, not every solution is physical in that it is an Euler-Lagrange equation. We present Helmholtz metrics to quantify this resemblance for a given ODE and demonstrate their capabilities on several fundamental systems with noise. We combine them with a second order neural ODE to form a Lagrangian neural ODE, which allows to learn Euler-Lagrange equations in a direct fashion and with zero additional inference cost. We demonstrate that, using only positional data, they can distinguish Lagrangian and non-Lagrangian systems and improve the neural ODE solutions.
								
Anna: Identifying Cluster Types found with AstroLink in MW-like Galaxies
									A galaxy's structure holds information on its formation and evolution including past merger events. To find such structures we employ AstroLink,
									a hierarchical clustering algorithm designed for astronomical data, that identifies regions of overdensity in the galaxy. For the resulting clusters, 
									we work on a pipeline to automatically label different types with an emphasis on differentiating between satellite dwarf galaxies and strongly disrupted 
									satellites resulting in star streams in order to determine AstroLink's ability to find remnants of past mergers and to reconstruct the galaxy's merger history. 
									Additonally, we investigate the young and compact clusters found in the disk to determine their origin and characteristics.
								
Robin: Neural Operator learning for astrophysical problems
									Robin is generally interested in efficient and scalable solutions to computational problems, mainly utilising machine learning. 
									Such solutions can often be found by using model architectures that take the symmetries of the underlying problem or data into account.
									Currently, his project applies neural operators like DeepONet to predict the time evolution of functions like cosmic ray spectra or chemical abundances. 
									In addition to inference time, accuracy and memory footprint, it is also of interest to quantify the predictive uncertainty in these applications to decide how much 
									to trust the predictions of the surrogate model.
								
Leonard: Blown in the Wind: Simulation and Surrogate Modeling of Stellar Wind Feedback
									Momentum, mass and energy injection from stellar winds and supernova explosions of massive stars effect the interstellar medium (ISM). 
									Such stellar feedback impacts galaxy formation and evolution (via its contribution to galactic winds), is believed to drive turbulence in the ISM 
									and thought to be a reason for low galactic star formation efficiency. The interplay of cosmic rays with interstellar 
									bubbles blown by supernovae / stellar winds is also of interest, e.g. in terms of removal of thermal energy from those bubbles 
									and re-acceleration of galactic cosmic rays. However, it is computationally infeasible to make galaxy simulations fine enough (in terms of the timestep / 
									spatial resolution) to accurately resolve wind-blown bubbles and their impact, motivating a surrogate model based on machine learning models.
								
PhD projects
People and their projects
Anna Lena: A numerical investigation on galactic outflows using NIHAO and GECKOS
									The formation and evolution of galaxies is a complex and multiscale problem, where gas flows play an important role. Cosmological simulations have illustrated that 
									feedback mechanism from stellar winds, radiation fields, supernovae and active galactic nuclei (AGNs) can effectively drive galactic winds. 
									However, the details of the driving mechanisms behind gas in- and outflows are still unclear. Also, many cosmological simulations overlook Cosmic Rays 
									(CRs, feedback from relativistic particles), which are another source of non-thermal feedback. We use the state-of-the-art cosmological hydrodynamical simulation of 
									Milky Way like galaxies NIHAO (Numerical investigation of a hundred astrophysical objects) to study the effect of different feedback processes on gas in- and outflows at 
									different gas temperatures to investigate in the underlying mechanisms driving gas flows. To validate and extend our findings, we study the edge-on spiral galaxies 
									from the GECKOS (Generalising Edge-on galaxies and their Chemical bimodalities, Kinematics, and Outflows out to Solar environments) survey, a large MUSE program recently 
									granted. We plan to use simulation-based inference to test the different feedback processes in our simulations on the observational data.
								
Giuseppe: Simulation-based inference for galactic archeology
									Galaxies evolve through merging and accreting dwarf galaxies therby destroying lower-mass systems over their lifetimes. 
									The contribution that those lower-mass systems bring to the present-day stellar body has been frozen in stellar halos by the long orbital 
									timescales, making the relicts of this object retain part of their initial progenitor orbit. But dynamical information is not enough to 
									disentangle these components, and the complementary chemical information helps to characterize these building blocks. 
									In fact, chemical abundances of stars are conserved quantities, making the chemical abundance plane ([Fe/H] vs [O/Fe]) a distinct imprint 
									that retains information on the conditions of formation of their stars, like the total mass and the age of the system until the merging 
									event. This theoretical background allows us to attempt to decompose the stellar halo into its components, unraveling the merger history of Milky Way-like galaxies. 
									In the modern era of large N-body galaxy simulations, we recast this question into a Simulation-Based Inference problem to recover the properties of this building block, 
									(e.g. total stellar mass, infall time, ...) using the chemical abundance plane as observables. 
									We therefore present CASBI (Chemical Abundance Simulation Based Inference), a python package to recover the posterior probability of properties of building blocks of 
									Milky Way like galaxy halo. Moreover, CASBI incorporates conditional neural network architectures as simulators to obtain observables from parameters, 
									smoothly interpolating on unseen regions of the parameter space.
								
Postdoc projects
People and their projects
William: Structural decomposition of synthetic and observational data sets with machine learning
									Among various topics, I am mainly interested in understanding structure formation and evolution particularly at the galactic and sub-galactic scales. 
									As such, my research is currently focused on using novel machine learning techniques to build algorithms that are capable of classifying structure within both 
									synthetic and observational data sets in a physically meaningful way. Recently I published AstroLink with the paper here; an unsupervised, hierarchical, and density-based clustering 
									algorithm (fully documented on readthedocs) that extracts arbitrarily-shaped statistically-robust overdensities from n-dimensional point-based data sets. Now I am developing FuzzyCat; a brute-force, 
									model-agnostic, and empirical method of propagating the uncertainties of observational data sets into soft-labels for improved statistical analyses of clusters. 
									In addition, I plan to create a locally-adaptive-metric scheme that learns a map between the cumulative distribution function of a data set and its image. 
									With these tools I will reveal extensive structural catalogues of state-of-the-art N-body simulations as well as of our own Milky Way, pathing the way for a more 
									complete understanding of structure formation and evolution within the context of Lambda-CDM.
								
Aura: Quasar properties from large scale CGM emission
									When quasars (supermassive black holes accreting at rates comparable to their Eddington limit) have large amounts of cool (T~10^4K) gas 
									in their environment (the circumgalactic medium - CGM - of their host galaxies), we can sometimes observe large scale (up to hundreds of 
									kpc) glows or nebulae in the UV emission lines surrounding the host galaxies (e.g. Cantalupo et al. 2014, Hennawi et al. 2015, Arrigoni Battaia et al. 2019). 
									These nebulae encode information about both the amount and state of the cool CGM gas in massive dark matter halos at high redshifts, as well as the central 
									engine driving the emission. Thus, emission glows can be used to understand the baryon cycle around massive galaxies and the impact of high-energy phenomena 
									such as black hole feedback on the large-scale gas fuel for star formation - the galaxies' CGM.
									We are building a large database of photoionization models, covering the full property space of quasars (e.g. black hole mass, accretion 
									rate) and the full property space of the CGM gas (density, temperature, composition), which will be used to generate synthetic 
									observations for CGM (see also our recent paper), as shown in the figure on the left. To efficiently generate such observations, we are training neural 
									networks/operators on the CGM properties of interest (e.g. line emissivities, ionization fraction of different atomic species, 
									heating and cooling rates), with the goal of developing a simulation-based inference pipeline that can be applied to real observations.
								
Aura: Galactic Structure Finder (GSF): a tool to reveal the building blocks of galaxies
									GSF is a Python analysis code that searches for galaxy substructure in n- dimensional (nD) stellar feature spaces using Gaussian Mixture 
									Models (GMMs), and constructs summary statistics of these components. We are developing it with the goal of understanding how the building 
									blocks of galaxies form and evolve, and how they relate to the dark universe.
									
									The code is equipped with two model selection criteria that can rank the models (described by the number of multivariate Gaussians n) 
									from best to worst. In contrast to the common practice in astrophysics of using poorly physically motivated functions to model partially 
									overlapping galaxy substructures, such as disks, bulges, bars, and stellar halos, GSF performs a purely data-driven unsupervised 
									partitioning of galaxies, for any user-defined input feature space (spatial, dynamical, and population properties can be mixed in any 
									way the user imagines).
									The figures on the left refer to a simulated TNG50 galaxy that was analyzed with GSF in an input 6D feature 
									space to reveal a disk, bar, and spheroid (Obreja et al. in prep). This type of partitioning allows the detailed study of the formation 
									history of individual galaxy components.
									As an interesting application, galaxy components separated by GSF on the basis of their dynamics have also been shown to be good 
									proxies for properties of dark matter halos that are beyond the reach of observations, such as halo spins as shown in our 2022 paper.
								
Lorenzo:
									to be filled
								
Past projects
Christian: Deep multi-modal representation learning for stellar spectra
									Given the rising number of surveys that measure spectra of Milky Way stars, the question of how to gain the most knowledge through cross-survey analyses becomes more 
									important than ever. In this work, state of the art deep multi-modal representation learning algorithms are employed, including contrastive learning and multi-modal 
									Variational Auto Encoders. Thus, resulting in representations that capture the informative aspects of the data. The representations produced by these algorithms, 
									are then used as foundation for multi-modal tasks, such as cross-modal generation and fusion.
								
Lingyi: Model comparison and miss-specification detection for cosmological simulations
									Cosmological simulation is a powerful tool in advancing our understanding of galaxy evolution and formation. There are many simulations capable of describing physical 
									properties such as gas cooling, star formation, scale relationships, chemical bimodality, and more. A question naturally arises with these simulations: How close are 
									they to the reality? Many previous studies evaluate galaxy simulations using scaling relations which could be biased with only a few physical properties. In this work, 
									we apply a novel model misspecification detection technique based on Bayesian Inference to compare simulated galaxy images from NIHAO and IllustrisTNG simulations against 
									SDSS observations. After encoding images to summary space preserving original information as much as possible, a distribution discrepancy between real observed data and 
									mock data in summary space indicates model misspecification. To this purpose, we perform an outlier detection on latent vectors. We further compare the inliers in summary 
									space by casting Bayesian model comparison to classification task, to see which model is relatively the best. Additionally, we use explainable AI methods like SHAP values 
									to partially explain our results.
								
Justin: Machine learning models for emission line prediction
									Line emissivities of active galactic nuclei (AGN) can be accurately computed using photoionization software like CLOUDY, 
									but the data generation is quite slow. In order to enable the fast generation of large amounts of data for example for further computations inside cosmological simulation codes, 
									machine learning methods like Deep Fully Connected Neural Networks or Neural Operators can be applied. The data required for training is generated once with cloudy as a 6 dimensional 
									grid in the dimensions Eddington Luminosity, black hole mass, gas temperature, metallicity, hydrogen density and distance. The goal is to interpolate this data as fast and accurately 
									as possible. 
								
Ufuk: Galaxy morphology models with AI
									We investigate the use of machine learning to create galaxy morphology models and encode the information 
									contained in modern state-of-the-art galaxy simulations. Simulation data from the IllustrisTNG project is 
									used to calculate the Eigengalaxies as the basis vectors of the transformed image space using Principal 
									Component Analysis in two and three dimensions and it is demonstrated, how eigengalaxies encode specific 
									morphological information. In the final step the NVIDA StyleGAN2-ada is trained on the 2D galaxy images. 
									We find that a simple PCA model with 49 eigengalaxies can recover the general shape of 9763 two dimensional 
									galaxy images and even if 99% of all images have a reconstruction error less than 9%, a more advanced model 
									like a GAN is needed to describe highly nonlinear structures (e.g spiral arms) and small scale structures. 
									All results are published here with all source code of the GAMMA project to be found 
									here. Additional animations 
									and interactive dashboards can be accessed through the GitHub-Repository of Ufuk's thesis. 
								
Luca: GalacticFlow: Learning a Generalized Representation of Galaxies with Normalizing Flows
									State-of-the-art galaxy formation simulations generate data within weeks or months. Their results consist of a random sub-sample of possible galaxies with a fixed number 
									of stars. We propose a ML based method, GalacticFlow, that generalizes such results. We use normalizing flows to learn the extended distribution function of galaxies 
									conditioned on global galactic parameters. GalacticFlow then provides a continuized and condensed representation of the ensemble of galaxies in the data. 
									Thus, essentially compressing large amounts of explicit simulation data into a small implicit generative model. Our model is able to evaluate any galaxy eDF given by a 
									set of global parameters and allows generating arbitrarily many stars from it. We show that we can learn such a representation, embodying the entire mass range from 
									dwarf to Milky Way mass, from only 90 galaxies in ∼18 hours on a single RTX2080Ti and generate a new galaxy of one million stars within a few seconds. All code to reproduce our results can be found here.
								
Immanuel: Speeding up astrochemical reaction networks with autoencoders and neural ODEs
									In astrophysics, solving complex chemical reaction networks is essential but computationally demanding due to the high dimensionality and stiffness of the ODE systems. 
									Traditional approaches for reducing computational load are often specialized to specific chemical networks and require expert knowledge. Immanuel's project introduces a machine 
									learning-based solution employing autoencoders for dimensionality reduction and a latent space neural ODE solver to accelerate astrochemical reaction network computations. 
									Additionally, we propose a cost-effective latent space linear function solver as an alternative to neural ODEs. These methods are assessed on a dataset comprising 29 chemical 
									species and 224 reactions. Our findings, also published here demonstrate that the neural ODE achieves a 55x speedup over the baseline model while maintaining significantly higher accuracy by up 
									to two orders of magnitude reduction in relative error. Furthermore, the linear latent model enhances accuracy and achieves a speedup of up to 4000x compared to standard 
									methods. All code to reproduce our results can be found here.
								
Eva: Star formation models in cosmological simulations
									State-of-the-art cosmological hydrodynamical simulations of galaxy formation have reached the point at 
									which their outcomes result in galaxies with ever more realism. Still, the employed sub-grid models 
									are relatively simple empirical models that include several free parameters. One of the challenges is
									the robust identification and characterization of star forming regions and various methods have been proposed. 
									However, the results of these methods are often inconsistent and the underlying assumptions are not always
									physically well motivated. In this project, we investigate how we can robustly identify star forming regions
									in cosmological simulations and explore novel prescriptions of calculating the star formation efficiency of these regions.
								
Kalina: Recreating the Hubble Tuning-Fork with generated galaxies using Normalising Flows
									Through a detailed study of 95 galaxies from the NIHAO simulations, we estimate critical parameters like the bulge-to-total (B/T) ratio and bar strength, employing methodologies 
									such as kinematic decomposition and Fourier analysis. This analysis allows us to explore the intricate diversity of galaxy morphologies and their physical properties.
									Leveraging the power of normalizing flows, specifically the GalacticFlow model, we model the multidimensional parameter space derived from the new found galactic properties. 
									This approach enables us to generate realistic galaxy simulations efficiently, capturing the vast variability in galactic structures. The thesis highlights the model's 
									capability to replicate the classical Hubble tuning-fork diagram computationally, emphasizing the impact of parameters like the B/T ratio on galactic morphology.
								
Rebekka: Galactic chemical enrichment in MW dwarf galaxies
									With the advent of large spectroscopic surveys the amount of high quality chemodynamical data in the Milky Way and its satellite galaxies increased tremendously. 
									Accurately and correctly capturing and explaining the detailed features in the high-quality observational data is notoriously difficult for state-of-the-art numerical models. 
									Here we investigate a set of new cosmological galaxy formation models that include improved prescriptions for chemical enrichment and compare our findings to the 
									observational data.
								
