MyDigiTwin is a scientific initiative for the development of a platform for the early detection and prevention of cardiovascular diseases. This platform, which is supported by prediction models trained in a federated fashion to preserve data privacy, is expected to be hosted by the Dutch Personal Health Environments (PGOs). Consequently, one of the challenges for this federated learning architecture is ensuring consistency between the PGOs data and the reference datasets that will be part of it. This paper introduces a novel data harmonization framework that streamlines an efficient generation of FHIR-based representations of multiple cohort study data. Furthermore, its applicability in the integration of Lifelines' cohort study data into the MiDigiTwin federated research infrastructure is discussed.