Digital twins in biomedicine need to evolve continuously to represent the current state of knowledge and data. A large-scale implementation of the digital twin paradigm for human health requires the construction and execution of highly complex models, composed of several component models which span multiple spatial and temporal scales. A flexible software development platform is needed that enables multidisciplinary and distributed teams to work together, supports reproducibility, and facilitates the integration of data and component models. Design patterns common to traditional model implementations impair the development of integrative digital twins. Some of these patterns include:

  1. Lack of transparency in the implementation of computational models;
  2. Intertwined component models and simulation processes dependent on each other;
  3. Use of incompatible data structures and computer languages;
  4. Brittle architectures that do not easily accommodate extensions of a model;
  5. Software environments that do not easily support distributed collaboration.

To address these problems we have developed a novel approach based on an open source, highly modularized, computational representation of a digital twin architecture. While the concept of modular design of models and software is well-established, the way modules are assembled generally suffers from the shortcomings listed above. The central principle of the architecture we have developed is the separation of computational algorithms for the different dynamic processes from each other, eliminating dependencies that make model modifications and extensions cumbersome or impossible in complex models. It also features the separation of computational algorithms from data, in the sense that all data describing the global model state, including model parameters, are separate from the individual computational modules, in a ‘hub-and-spoke’ transparent architecture designed to facilitate extension and modification. This approach differs fundamentally from the conventional approach to building and simulating such models in biomedicine.

The core principle underlying the highly modularized architecture we propose here is to treat each dynamic biological process in the model, or related collections of processes, as a separate module of the digital twin. In a biological context a molecular module might contain the algorithms for diffusion and transport of that molecule, while a cellular model could contain sub-models related to that cell’s function. The individual modules are only indirectly connected by communicating through a central data structure, the global model state, rather than passing data to each other directly.

This prevents any direct dependencies between the computational portion of modules, a key feature that enables the model to be readily extended or modified. The global model state is the repository for all data describing the state of the simulated model at a given point in time, including any information about the underlying physical structure, if included, and variable states of all computational models in the modules. All computational algorithms, on the other hand, are contained in the modules, providing a clear separation between the model and the data on which it operates during model simulation. The resulting computational structure naturally separates model components so that they may be validated by the distinct modalities natural for each of the dynamic processes in the model, facilitating continued model refinement and personalization. Our implementation contains four components:

  1. a runtime configuration file,
  2. a global model state,
  3. modules, and
  4. a simulation framework that controls simulation runtime and provides data structures and algorithms useful for the development of new modules.

These four components and their relationship are represented in Figure 1, providing a structure for the model components and their dependencies.


Figure 1. Modular model components.

The modular design implementation contains five components:

  1. The runtime configuration file, that contains all configuration and parameter settings for a given simulation run (config.ini).
  2. A simulation solver, that reads the configuration (config.ini) and constructs, initializes, and advances the simulation in time by executing each module according to its inherent time scale.
  3. The model state contains all data describing the state of the model at a given point in time, including any physical space geometry, and states of model objects. In this example, the model includes a spatial component. The model state is a contiguous block of memory as shown by the partitioned rectangle, with the hierarchical Python referencing syntax shown to the left of the representation.
  4. Each module consists of a computational model that takes all input data from the model state and stores none itself.
  5. These modules extend classes provided as part of the simulation framework, ModuleState and ModuleModel which handle the connection to the simulation solver and model state access so the developer only needs to consider the biological additions to the model. Extending the ModuleState results in the fields defined in the extending class being appended to the model state. The initialize and advance functions in the ModuleModel extension will be called by the simulation solver so the module can participate in the simulation.


The source code for the simulator is licensed under the Apache 2.0 License and is available on Github.