A Natural Boundary of Dark Matter Haloes

Using large-scale cosmological N-body simulations to discover a "bias trough" that marks the boundary where haloes deplete their environment—establishing new physical scales that improve upon traditional models.

Publication: Monthly Notices of the Royal Astronomical Society (MNRAS), 2020
Role: First Author
Data Scale: 8+ billion particles from N-body cosmological simulations

Read the Full Paper

Summary

This paper used large-scale cosmological N-body simulations to discover a "bias trough" that marks the boundary where dark matter haloes deplete their surrounding environment. This discovery established new physical scales—the Inner Depletion Radius and the characteristic depletion radius—that improve upon the traditional "virial radius" used for decades in astrophysics.

Data Visualization

The figures in this paper demonstrate the ability to communicate multi-dimensional statistical relationships through thoughtful visual design. Each figure was created using Python (Matplotlib, Seaborn) with custom visualization pipelines.

Figure 1: Multi-Parameter Bias Profiles

Six-panel plot showing halo bias profiles as functions of radius, binned by different halo parameters including mass, velocity ratio, spin, shape, formation time, and environment density.
Figure 1: The halo bias profile as a function of radius, binned by various halo parameters. Each panel shows the bias binned by a different parameter (labeled top right). The ubiquitous trough visible across all panels defines a characteristic "depletion boundary"—a key finding of the research.

Data science approach: Six-panel grid enables systematic comparison of how different features (mass, velocity, spin, shape, formation time, environment) affect the target variable (bias profile). Color gradients encode parameter binning. Log-scale axes handle a large dynamic range. The horizontal anchor line at unity bias provides a consistent reference across all panels.

Figure 4: Two-Parameter Space Mapping

Grid of 2D contour plots showing characteristic depletion radius values across different combinations of halo parameters, with contour lines and shaded regions indicating parameter distributions.
Figure 4: The characteristic depletion radius (rcd) for haloes binned by pairs of properties. Gray filled contours show the number density distribution. Thin solid lines are measured values; thick transparent lines are Gaussian Process Regression model predictions.

Data science approach: This 6×6 matrix visualization is a systematic feature interaction analysis. By overlaying measured values against Gaussian Process model predictions, the figure directly communicates where the model captures the data well and where residual complexity exists—a key diagnostic for model validation.

Figure 8: Theory Illustration

Illustration of halo accretion around the maximum inflow location, showing mass flow magnitudes and halo evolution from initial state to evolved state.
Figure 8: Illustration of halo accretion around the maximum inflow location. Horizontal arrows represent mass flow magnitudes. The evolution of the halo is shown by the transition from the solid black line to the dashed red line.

Data science approach: This conceptual diagram bridges quantitative results and physical interpretation. It translates velocity field measurements into an intuitive visual model of mass transport, making the mechanism behind the depletion boundary accessible to readers from different specializations.

Figure 9: Phase Space Dynamics

Three-panel visualization showing scaled radial velocity distribution for different halo mass bins, with particle density shown as a color map and white contours marking isodensity levels.
Figure 9: The scaled radial velocity distribution for three mass bins, showing 100 randomly selected haloes each. Color maps show particle density; white curves mark isodensity contours. Vertical lines indicate key physical radii.

Data science approach: Phase-space density estimation using kernel-based methods on millions of particle positions and velocities. The layered visualization (density colormap + isodensity contours + reference lines) encodes multiple data dimensions simultaneously. Mass-binned panels enable direct comparison of how the underlying distribution shifts with the primary feature variable.

Note on Figure Availability

If the figures are not displaying, they need to be extracted from the PDF and placed in the assets/images/figures/ directory. See the deployment instructions for details.

Data Pipeline

  • Simulation data: N-body cosmological simulations containing 8+ billion dark matter particles, tracking gravitational evolution of the universe
  • Halo identification: Automated detection of gravitationally-bound structures (haloes) within the simulation volume
  • Property extraction: Computing mass, velocity, spin, shape, formation time, and environmental density for each identified halo
  • Bias profile computation: Calculating the ratio of local-to-average matter density as a function of distance from each halo center
  • Phase-space analysis: Extracting radial velocity distributions for particle populations around haloes

Statistical & ML Methodology

  • High-dimensional phase-space analysis: Examining particle distributions in position-velocity space to identify physical boundaries
  • Gaussian Process Regression: Non-parametric modeling of the characteristic depletion radius as a function of multiple halo properties simultaneously
  • Multi-parameter binning: Systematic analysis of bias profiles across 6 different halo parameters, individually and in pairs
  • Velocity profile modeling: Decomposing radial velocity distributions into infall and splashback components to measure mass accretion rates
  • Feature engineering: Identifying which combinations of halo properties most strongly predict the depletion boundary location

Data Science Significance

This work is fundamentally a feature engineering and predictive modeling problem: given a set of halo properties, can we predict the location of a physical boundary? The Gaussian Process Regression approach allowed non-parametric modeling of complex, non-linear relationships in high-dimensional parameter space.

Skills Demonstrated

N-body Simulations Bias Profiles Gaussian Process Regression Phase-space Analysis Feature Engineering Matplotlib Seaborn Python LaTeX Data Visualization Velocity Profile Modeling Mass Accretion Rate Scientific Writing