Smart Microfluidics: a curated dataset of microfluidic liposome formulations with cross-laboratory validation for machine-learning applications
Published in Data in Brief, 2026
Recommended citation: L. Lavagna, G. Buttitta, S. Bonacorsi, C. Barbarito, M. Moliterno, G. Saito, I. Oddone, G. Verdone, S. Raimondi, and M. Panella, "Smart Microfluidics: a curated dataset of microfluidic liposome formulations with cross-laboratory validation for machine-learning applications," Data in Brief, 2026, 112667. [https://ieeexplore.ieee.org/document/11228152](https://www.sciencedirect.com/science/article/pii/S2352340926002209)
This dataset documents microfluidic production runs of liposome formulations generated across two independent laboratories using standardized lipid compositions and controlled flow conditions and it has been published as a complement of our previous paper “Machine Learning-Guided microfluidic optimization of clinically inspired liposomes for nanomedicine applications “.
The data include formulation parameters, microfluidic operating settings, and dynamic light scattering measurements of vesicle size and polydispersity, providing structured input–output relationships suitable for data-driven analysis. Raw spreadsheets from each laboratory were harmonized using a reproducible preprocessing workflow implemented in Python, which performs column standardization, fuzzy-matching corrections, physical-range validation, chip-type filtering, and dataset consolidation. The cleaned dataset comprises 304 micromixer-produced liposome formulations, while a separate file contains 12 independent wet-lab validation formulations. Two additional dataset extensions were generated via Gaussian-noise perturbation and SMOTENC-based oversampling to support machine-learning benchmarking and algorithm comparison. Cross-laboratory records enable evaluation of operator variability, equipment reproducibility, and robustness of predictive modeling workflows. Metadata files documenting feature descriptions, naming conventions, and physical bounds facilitate reuse in automated pipelines and FAIR-compliant repositories. All datasets, raw files, preprocessing scripts, and cleaning logs are publicly available on Zenodo, enabling regression benchmarking, inverse-design studies, and comparative evaluation of modeling approaches for formulation–property relationships in microfluidic liposome synthesis.
Code available here.
