MDZip: Neural Compression of Molecular Dynamics Trajectories for Scalable Storage and Ensemble Reconstruction
Abstract
The size of molecular dynamics (MD) trajectories remains a major obstacle for data sharing, long-term storage, and ensemble analysis at scale. Existing solutions often rely on frame subsampling or reduced atom representations, which limit the utility of shared datasets. Here, we present MDZip, a neural compression framework based on convolutional autoencoders trained per system to reconstruct atomic trajectories with high geometric fidelity from compact latent representations. MDZip achieves over 95% reduction in storage size across a diverse benchmark of proteins, protein-peptide complexes, and nucleic acids. Despite operating in a physics-agnostic manner, the reconstructed trajectories accurately preserve ensemble-level features, including RMSD fluctuations, pairwise distance distributions, radius of gyration, and projections onto principal and time-lagged independent components. A residual (skip-connected) autoencoder variant consistently improves reconstruction accuracy and reduces outliers. While local structural deviations can impair energetic fidelity, short energy minimization partially recovers physically reasonable conformations. This framework enables customizable compression-accuracy trade-offs and supports a modular workflow for sharing latent representations, decoder models, and reconstruction protocols. MDZip offers a scalable solution to current storage limitations, facilitating broader dissemination of MD data without sacrificing essential dynamical information.
Related articles
Related articles are currently not available for this article.