Multithreading
Multithreading is the capability for GridLAB-D™ to use multiple threads/cores on the computer to execute the models being studied more quickly. Due to the agent-based nature of GridLAB-D™, aspects of the model structure lend itself well to being executed in parallel (minimal or no interaction between those particular objects), which can use multithreading to execute faster.
Motivation
GridLAB-D™ models of actual systems often have a very large number of objects and time-series simulations can take a significant amount of compute time to complete. Methods to improve the computation time/computational efficiency of GridLAB-D™ help improve the user experience, and also improve the applicability of GridLAB-D™ to near-term planning/potential operations instead of long-term planning or post-event reconstruction.
Most modern computers have several threads and/or distinct computation cores at their discretion, so nearly all users can benefit from a reliable multithreading capability being implemented in GridLAB-D™. While some aspects will remain sequential, longer time series may benefit almost linearly from increased thread counts, allowing that level of speedup on the GridLAB-D™ model runs.
Feature Objective
Multithreading gets divided into two main implementations in GridLAB-D™:
- batch multithreading and
- model multithreading.
Batch multithreading is primarily used by the autotest feature in GridLAB-D™, which effectively allow a single controlling instance of GridLAB-D™ to spin up individual instaces of GridLAB-D™ running specific autotest models. The individual instances are still single threaded, but multiples of them can be executed in parallel as they are independent instances of GridLAB-D™ (so called "embarassingly parallel" implementations).
Model multithreading is allowing common ranks of objects to be executed in parallel within a single GridLAB-D™ model instance (GLM of JSON file). Multithreading only applies to the time-series execution portion of the overall GridLAB-D™ program loop -- items like the file loader and objection creation will still be single threaded for the immediate future.
Functionality
Interactions with the multithreading capability will be different between a developer and a user.
On the development side, GridLAB-D™ core functions are expected to handle most of the specifics of multithreading such that the common model/module developer will not need to do anything different. The exception will be any potential contention areas, where additional memory management/locking features may be needed, which developers will need to include in their objects.
On the user side, the only interaction will be designating a core/threadcount for GridLAB-D™ to utilitize, which will just result in additional performance/faster simulation times. Answers from GridLAB-D™ models should be identical between single-threaded and multithreaded runs, with the only difference being in execution time.
Executive Overview
GridLAB-D™ implements a sophisticated multi-threaded execution engine using C++11 std::thread and synchronization primitives. The system employs two complementary threading models:
- Central Threadpool — A job-queue-based threadpool in
exec.cppfor load-balancing object synchronization across CPU cores - Module-Specific Threading — Direct
std::threadimplementations in specialized modules (loadshape, enduse, schedule) for parallel data processing
What Does the Threadpool Do?
The threadpool distributes simulation object synchronization workload across multiple CPU cores, enabling faster convergence in power flow calculations and other intensive operations. Each worker thread processes a subset of objects during each simulation timestep, with synchronization barriers ensuring consistency.
| Aspect | Central Threadpool | Module-Specific Threads |
|---|---|---|
| Scope | Global job queue for object sync | Per-module parallel processing |
| Trigger | Every sync operation in exec.cpp | Module-specific sync functions |
| Synchronization | Atomic counters + condition variables | Condition variables with ready flag |
| Thread Count | Configurable via global_threadcount |
Matches threadcount setting |
| Data Partitioning | By object parent for cache locality | By module-specific segments |
| Use Case | General-purpose object sync | Specialized * batch operations |
System Architecture (High-Level)
Main Thread (exec.cpp)
├── Central Threadpool (N workers + 1 sync thread)
│ ├── Job Queue (lock-free submission)
│ ├── Worker Threads (process jobs in parallel)
│ └── Sync Thread (for synchronous execution mode)
│
├── Loadshape Thread Group (M threads, direct std::thread)
│ └── Condition variable coordination
│
├── Enduse Thread Group (M threads, direct std::thread)
│ └── Condition variable coordination
│
└── Schedule Thread Group (M threads, direct std::thread)
└── Condition variable coordination
Central Threadpool Infrastructure
Threadpool Class Design
The cpp_threadpool class is defined in gldcore/cpp_threadpool.h and implemented in gldcore/cpp_threadpool.cpp.
Constructor:
cpp_threadpool(int num_threads)
- If
num_threads == 0, usesstd::thread::hardware_concurrency()for auto-detection - Creates
num_threads + 1total threads (N workers + 1 synchronous execution thread) - Initializes atomic counters, mutexes, and condition variables
Public Interface
class cpp_threadpool {
public:
// Constructor
cpp_threadpool(int num_threads);
~cpp_threadpool();
// Job submission
void add_job(std::function<void()> job);
// Synchronization
void await(); // Block until all queued jobs complete
// Execution mode control
void set_sync_mode(bool mode); // true=serial, false=parallel
// Thread identification
std::map<std::thread::id, int> get_threadmap() const;
};
Key Methods:
add_job(callable): Enqueues a job (lambda or function pointer) to the thread-safe queue. Thread-safe via mutexqueue_lock.await(): Blocks the main thread until all enqueued jobs complete. Uses condition variablewait_conditionand atomic counterrunning_threads.set_sync_mode(bool): Switches between parallel execution (false) and synchronous execution (true). When true, jobs execute in a single dedicated thread rather than the worker pool.get_threadmap(): Returnsstd::map<std::thread::id, int>mapping each worker thread's ID to its index (0 to N-1).
Internal Architecture
Thread Pool Structure:
┌─────────────────────────────────────────────┐
│ cpp_threadpool (Main Thread) │
├─────────────────────────────────────────────┤
│ │
│ Thread-Safe Job Queue │
│ ┌──────────────────────────────┐ │
│ │ std::queue<std::function> │ │
│ └──────────────────────────────┘ │
│ Protected by: queue_lock │
│ │
│ Worker Thread Pool: │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Worker 0 │ │ Worker 1 │ │ Worker N │ │
│ │ (Thread) │ │ (Thread) │ │ (Thread) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Synchronization: │
│ • queue_lock (std::mutex) │
│ • wait_lock (std::mutex) │
│ • sync_mode_lock (std::mutex) │
│ • condition (std::condition_variable) │
│ • wait_condition (std::condition_variable) │
│ • running_threads (std::atomic_int) │
│ • exiting (std::atomic_bool) │
│ │
└─────────────────────────────────────────────┘
| Primitive | Type | Purpose |
|---|---|---|
queue_lock |
std::mutex |
Protects job queue access |
wait_lock |
std::mutex |
Protects wait condition |
sync_mode_lock |
std::mutex |
Protects sync mode flag |
condition |
std::condition_variable |
Signals job availability to workers |
wait_condition |
std::condition_variable |
Signals completion to main thread |
running_threads |
std::atomic_int |
Lock-free count of active jobs |
exiting |
std::atomic_bool |
Lock-free shutdown signal |
Worker Thread Lifecycle using Job
Worker Thread Created
│
▼
┌─────────────────────────────┐
│ Main Loop (Until exiting) │
├─────────────────────────────┤
│ │
│ Wait on condition variable │
│ (no jobs available) │
│ │ │
│ ▼ │
│ Job available? ──→ NO │
│ │ │ │
│ ▼ ▼ │
│ YES Back to Wait │
│ │ │
│ ▼ │
│ Get next job │
│ Release queue_lock │
│ │ │
│ ▼ │
│ Execute job │
│ │ │
│ ▼ │
│ running_threads-- │
│ Notify wait_condition │
│ │ │
│ ▼ │
│ Back to Wait │
└─────────────────────────────┘
┌─────────────────────────────┐
│ Add Job to Threadpool │
├─────────────────────────────┤
│ Acquire queue_lock │
│ running_threads++ │
│ Notify condition │
└─────────────────────────────┘
Threadpool Usage in Execution Engine
The threadpool is instantiated globally in gldcore/exec.cpp and used throughout the simulation cycle.
Threadpool Initialization
Location: exec.cpp, Line 3551
threadpool = new cpp_threadpool(global_threadcount);
This happens once at the start of the execution phase. The global_threadcount variable is set during initialization from command-line arguments or defaults to processor_count().
Object Synchronization Pattern
Location: exec.cpp, Lines 2320-2340
// Distribute object synchronization across threadpool
for (int k = 0; k < n_threads; k++) {
threadpool->add_job([=] {
obj_syncproc(&*thread[n+k]);
});
}
threadpool->await(); // Wait for all sync operations to complete
How It Works:
- Main thread creates one job per worker thread
- Each job calls
obj_syncproc()with a segment of objects (thread data structure) - Worker threads execute jobs in parallel, synchronizing their assigned objects
await()blocks main thread until all jobs complete- Main thread continues to next simulation phase
Commit Operation Batching
Location: exec.cpp, Lines 1683-1710
// Batch commit operations across threadpool
for (int k = 0; k < n_threads; k++) {
threadpool->add_job([=, &obj, &result]() {
// Commit operations for assigned object segment
commit_segment(thread[k]);
});
}
threadpool->await();
Similar pattern: distribute work across threads, wait for completion.
Thread-to-Index Mapping
Thread Data Structure: exec.h
class threadpool_thread_data {
std::map<std::thread::id, int> thread_map; // Maps thread ID → thread index
public:
int get_thread_index() const {
return thread_map[std::this_thread::get_id()];
}
};
Synchronization Data:
struct sync_data {
TIMESTAMP next_event_time;
int hard_event_count;
int status;
// ... other fields
};
Each worker thread has associated sync_data for tracking convergence and event timing.
Configuration & Environment Variables
Global Configuration Variable
Definition: gldcore/globals.cpp, Line 154
{"threadcount", PT_int32, &global_threadcount,
PA_PUBLIC,
"number of threads to use while using multicore"}
Global Variable:
int global_threadcount = 1; // Default: single-threaded
Command-Line Interface
Usage:
gridlabd --threadcount N model.glm
Behavior:
- N = 0: Auto-detect using std::thread::hardware_concurrency()
- N = 1: Single-threaded (no threadpool utilized)
- N > 1: Multi-threaded with N worker threads
- N < 0: Invalid (error)
Initialization Flow
Location: gldcore/main.cpp, Lines 207-210
if (global_threadcount == 0) {
global_threadcount = processor_count();
// On systems with 16 cores: global_threadcount = 16
}
// Output: "using 16 helper thread(s)"
Auto-Detection Logic:
On Linux/Unix:
- Calls std::thread::hardware_concurrency()
- Falls back to sysconf(_SC_NPROCESSORS_ONLN) if not available
On Windows: - Uses GetSystemInfo() to query processor count
Single-Threaded vs. Multi-Threaded Threshold
Throughout the codebase, modules check threadcount to enable/disable multi-threading:
if (global_threadcount < 2) {
// Single-threaded path (no thread overhead)
process_all_objects_sequentially();
} else {
// Multi-threaded path
create_worker_threads();
distribute_work();
wait_for_completion();
}