Checkpointing
Checkpointing serves as a method to take a "snapshot" of the current simulation time in GridLAB-D™ and export that information into a file. That file represents the whole system state of the GridLAB-D™ run and can be useful for tracking how the whole system is changing (as opposed to a recorder or multi_recorder that track specific properties), or it can be used as the starting point for a subsequent run.
As an example, consider a scenario where a GridLAB-D™ simulation needs to run for a year, but you want to try different controls at the 6-month point and evaluate how it impacts end of year metrics. Without checkpointing, each control iteration would require running the full year of simulation. With checkpointing, GridLAB-D™ runs to the 6-month timestamp, exports the checkpoint snapshot, and then proceeds to evaluate the first control strategy. To evaluate a second strategy, GridLAB-D™ would load that checkpoint snapshot, then you apply the second control and continue running, saving you from needing to run the identical first 6 months of model time again.
Motivation
The Checkpointing system was created to save simulation computation time, especially when changes to the behavior are expected later in a simulation run. If a GridLAB-D™ model takes 2 hours to run a year-long simulation, the ability to save a checkpoint at the 6-month (1 hour computation time) mark can save an hour of compute time on every subsequent run. This is especially useful if you are researching different controls or technologies for a specific event, or if you want periodic "backups" of your simulation in case there's issue with the computing platform.
Checkpointing is a feature aimed at all users, be they just normal GridLAB-D™ users (for computation time savings) or developers (to get a known starting point before a bug occurs or model error occurs, to help debug it).
Functionality
Checkpointing is expected to primarily be used through the C/C++ API (or Python interpretation of that). Using command line options and global variable settings is possible, but not recommended.
Checkpointing will not be supported in transient mode and will only take a "snapshot" on standard time-series timestamps. Checkpoint "snapshots" will occur at the end of a quasi-steady-state time-series timestamp (before moving to the next timestep).
Overview
GridLAB-D™ implements a checkpointing system that saves simulation state to JSON format, enabling snapshot captures of the simulation at configurable intervals. This allows simulations to be paused and resumed, and provides recovery capabilities.
This checkpointing system enables: - Long-running simulations to save state periodically - Distributed computing through state snapshots - Recovery mechanisms for simulation continuity - Pause/resume capabilities for simulations
Key Design Features
- In-Memory or File-Based: Can generate JSON in-memory without writing to disk
- Automatic File Naming: Uses model name and removes extensions automatically
- Directory Validation: Verifies output directory exists before writing
- Ordered JSON: Preserves field ordering for consistency
- Timezone Support: Includes timezone information in timestamps
- Hidden State Variables: Critical internal states marked and preserved automatically
Core Architecture
Checkpoint Types
Defined in gldcore/globals.h:
- CPT_NONE (0): Checkpointing disabled
- CPT_WALL (1): Checkpoint at wall-clock time intervals (default 3600 seconds/1 hour)
- CPT_SIM (2): Checkpoint at simulation time intervals (default 86400 seconds/1 day) - Currently set as default
Global Configuration Variables
Located in gldcore/globals.h:
global_checkpoint_type: Determines checkpoint trigger modeglobal_checkpoint_file: Base filename for checkpoint filesglobal_checkpoint_seqnum: Sequence number counter for multiple checkpoint filesglobal_checkpoint_interval: Time between checkpoints (seconds)global_checkpoint_keepall: Flag to retain all checkpoint files (0 = delete old, non-zero = keep all)global_checkpoint_loaded: Flag indicating whether a checkpoint was loaded
Implementation
Core Functions
do_checkpoint(const char *output_directory)
Located in gldcore/exec.cpp
- Creates an ordered JSON structure containing full simulation state
- Parameters:
output_directory: Optional directory path for writing checkpoint files (if nullptr/empty, generates JSON in-memory)- Returns:
nlohmann::ordered_jsoncontaining checkpoint data - Features:
- Generates timestamp with timezone information
- Extracts model name and strips file extensions automatically
- Checks directory existence before writing
- Creates filename format:
{modelname}_checkpoint.json
Main Loop Integration
do_checkpoint()is called withinexec_start()during the main simulation loop- Operates based on configured checkpoint type and interval
C++ API
Defined in gldcore/gldapi.h
Checkpoint Modes
enum GLDCheckPointMode {
GLD_CHECKPOINT_MODE_NONE = 0,
GLD_CHECKPOINT_MODE_SAVE = 1
};
Key Methods
GLDErrorCode save_checkpoint(const std::string &save_path, GLDCheckPointMode mode = GLD_CHECKPOINT_MODE_SAVE)
nlohmann::ordered_json get_checkpoint_json(const std::string& filepath = "")
Implementation in gldapi.cpp:510-560:
get_checkpoint_json() retrieves checkpoint state with optional directory specification
save_checkpoint() saves checkpoint to specified path and updates internal gld_model representation
Checkpoint Variables
Objects throughout the codebase mark internal state variables that should be included in checkpoints using the PT_ACCESS, PA_HIDDEN properties with descriptions prefixed by CHECKPOINT_VAR:
Examples: - generator/controller_dg.cpp Controllers mark predictor/corrector values - generator/solar.cpp Solar models mark temperature and timing variables - generator/windturb_dg.cpp Wind turbines mark voltage, current, and state values
These hidden checkpoint variables ensure critical internal state is preserved without exposing implementation details in the public API.
Configuration & Usage
GLM File Configuration Example from models/checkpoint_sim_test.glm:
#set checkpoint_type=SIM
#set checkpoint_interval=2419200 // 28 days in seconds
C++ API Usage From test_gldapi/test_gldapi.cpp:
assert(sim.save_checkpoint("state.chk", GLD_CHECKPOINT_MODE_SAVE) == GLD_SUCCESS);
Python Bindings
Located in checkpoint.py - Python bindings expose checkpoint mode constants: NONE and SAVE - CheckPointHarness class provides testing infrastructure
MySQL Recorder Integration
The test_mysql_recorder.xml:69-73 demonstrates checkpoint configuration for database recording: - checkpoint_type: Recording checkpoint type - checkpoint_file: File path for checkpoint - checkpoint_seqnum: Sequence number - checkpoint_interval: Recording interval - checkpoint_keepall: Boolean flag
Data Format
Checkpoints are stored as ordered JSON with: - Preamble section (__preamble): Contains comments array with generation timestamp and timezone - Clock information: Formatted as strings with timezone data - Object state: Full simulation object properties and values