Multithreading

Multithreading is the capability for GridLAB-D™ to use multiple threads/cores on the computer to execute the models being studied more quickly. Due to the agent-based nature of GridLAB-D™, aspects of the model structure lend itself well to being executed in parallel (minimal or no interaction between those particular objects), which can use multithreading to execute faster.

Motivation

GridLAB-D™ models of actual systems often have a very large number of objects and time-series simulations can take a significant amount of compute time to complete. Methods to improve the computation time/computational efficiency of GridLAB-D™ help improve the user experience, and also improve the applicability of GridLAB-D™ to near-term planning/potential operations instead of long-term planning or post-event reconstruction.

Most modern computers have several threads and/or distinct computation cores at their discretion, so nearly all users can benefit from a reliable multithreading capability being implemented in GridLAB-D™. While some aspects will remain sequential, longer time series may benefit almost linearly from increased thread counts, allowing that level of speedup on the GridLAB-D™ model runs.

Feature Objective

Multithreading gets divided into two main implementations in GridLAB-D™:

  • batch multithreading and
  • model multithreading.

Batch multithreading is primarily used by the autotest feature in GridLAB-D™, which effectively allow a single controlling instance of GridLAB-D™ to spin up individual instaces of GridLAB-D™ running specific autotest models. The individual instances are still single threaded, but multiples of them can be executed in parallel as they are independent instances of GridLAB-D™ (so called "embarassingly parallel" implementations).

Model multithreading is allowing common ranks of objects to be executed in parallel within a single GridLAB-D™ model instance (GLM of JSON file). Multithreading only applies to the time-series execution portion of the overall GridLAB-D™ program loop -- items like the file loader and objection creation will still be single threaded for the immediate future.

Functionality

Interactions with the multithreading capability will be different between a developer and a user.

On the development side, GridLAB-D™ core functions are expected to handle most of the specifics of multithreading such that the common model/module developer will not need to do anything different. The exception will be any potential contention areas, where additional memory management/locking features may be needed, which developers will need to include in their objects.

On the user side, the only interaction will be designating a core/threadcount for GridLAB-D™ to utilitize, which will just result in additional performance/faster simulation times. Answers from GridLAB-D™ models should be identical between single-threaded and multithreaded runs, with the only difference being in execution time.

Executive Overview

GridLAB-D™ implements a sophisticated multi-threaded execution engine using C++11 std::thread and synchronization primitives. The system employs two complementary threading models:

  1. Central Threadpool — A job-queue-based threadpool in exec.cpp for load-balancing object synchronization across CPU cores
  2. Module-Specific Threading — Direct std::thread implementations in specialized modules (loadshape, enduse, schedule) for parallel data processing

What Does the Threadpool Do?

The threadpool distributes simulation object synchronization workload across multiple CPU cores, enabling faster convergence in power flow calculations and other intensive operations. Each worker thread processes a subset of objects during each simulation timestep, with synchronization barriers ensuring consistency.

Table 1: Quick Comparison: Threadpool vs. Module Threading
Aspect Central Threadpool Module-Specific Threads
Scope Global job queue for object sync Per-module parallel processing
Trigger Every sync operation in exec.cpp Module-specific sync functions
Synchronization Atomic counters + condition variables Condition variables with ready flag
Thread Count Configurable via global_threadcount Matches threadcount setting
Data Partitioning By object parent for cache locality By module-specific segments
Use Case General-purpose object sync Specialized * batch operations

System Architecture (High-Level)

Main Thread (exec.cpp)
    ├── Central Threadpool (N workers + 1 sync thread)
    │   ├── Job Queue (lock-free submission)
    │   ├── Worker Threads (process jobs in parallel)
    │   └── Sync Thread (for synchronous execution mode)
    │
    ├── Loadshape Thread Group (M threads, direct std::thread)
    │   └── Condition variable coordination
    │
    ├── Enduse Thread Group (M threads, direct std::thread)
    │   └── Condition variable coordination
    │
    └── Schedule Thread Group (M threads, direct std::thread)
        └── Condition variable coordination

Central Threadpool Infrastructure

Threadpool Class Design

The cpp_threadpool class is defined in gldcore/cpp_threadpool.h and implemented in gldcore/cpp_threadpool.cpp.

Constructor:

cpp_threadpool(int num_threads)
  • If num_threads == 0, uses std::thread::hardware_concurrency() for auto-detection
  • Creates num_threads + 1 total threads (N workers + 1 synchronous execution thread)
  • Initializes atomic counters, mutexes, and condition variables

Public Interface

class cpp_threadpool {
public:
    // Constructor
    cpp_threadpool(int num_threads);
    ~cpp_threadpool();

    // Job submission
    void add_job(std::function<void()> job);

    // Synchronization
    void await();  // Block until all queued jobs complete

    // Execution mode control
    void set_sync_mode(bool mode);  // true=serial, false=parallel

    // Thread identification
    std::map<std::thread::id, int> get_threadmap() const;
};

Key Methods:

  • add_job(callable): Enqueues a job (lambda or function pointer) to the thread-safe queue. Thread-safe via mutex queue_lock.
  • await(): Blocks the main thread until all enqueued jobs complete. Uses condition variable wait_condition and atomic counter running_threads.
  • set_sync_mode(bool): Switches between parallel execution (false) and synchronous execution (true). When true, jobs execute in a single dedicated thread rather than the worker pool.
  • get_threadmap(): Returns std::map<std::thread::id, int> mapping each worker thread's ID to its index (0 to N-1).

Internal Architecture

Thread Pool Structure:

┌─────────────────────────────────────────────┐
│         cpp_threadpool (Main Thread)        │
├─────────────────────────────────────────────┤
│                                             │
│  Thread-Safe Job Queue                      │
│  ┌──────────────────────────────┐           │
│  │ std::queue<std::function>    │           │
│  └──────────────────────────────┘           │
│         Protected by: queue_lock            │
│                                             │
│  Worker Thread Pool:                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │ Worker 0 │  │ Worker 1 │  │ Worker N │   │
│  │ (Thread) │  │ (Thread) │  │ (Thread) │   │
│  └──────────┘  └──────────┘  └──────────┘   │
│                                             │
│  Synchronization:                           │
│  • queue_lock (std::mutex)                  │
│  • wait_lock (std::mutex)                   │
│  • sync_mode_lock (std::mutex)              │
│  • condition (std::condition_variable)      │
│  • wait_condition (std::condition_variable) │
│  • running_threads (std::atomic_int)        │
│  • exiting (std::atomic_bool)               │
│                                             │
└─────────────────────────────────────────────┘
Table 2: Synchronization Primitives
Primitive Type Purpose
queue_lock std::mutex Protects job queue access
wait_lock std::mutex Protects wait condition
sync_mode_lock std::mutex Protects sync mode flag
condition std::condition_variable Signals job availability to workers
wait_condition std::condition_variable Signals completion to main thread
running_threads std::atomic_int Lock-free count of active jobs
exiting std::atomic_bool Lock-free shutdown signal

Worker Thread Lifecycle using Job

    Worker Thread Created
              │
              ▼
    ┌─────────────────────────────┐
    │  Main Loop (Until exiting)  │
    ├─────────────────────────────┤
    │                             │
    │  Wait on condition variable │
    │  (no jobs available)        │
    │         │                   │
    │         ▼                   │
    │  Job available? ──→ NO      │
    │         │           │       │  
    │         ▼           ▼       │ 
    │        YES     Back to Wait │
    │         │                   │
    │         ▼                   │
    │    Get next job             │
    │    Release queue_lock       │
    │         │                   │
    │         ▼                   │
    │    Execute job              │
    │         │                   │
    │         ▼                   │
    │    running_threads--        │
    │    Notify wait_condition    │
    │         │                   │
    │         ▼                   │
    │    Back to Wait             │
    └─────────────────────────────┘

    ┌─────────────────────────────┐
    │  Add Job to Threadpool      │
    ├─────────────────────────────┤
    │      Acquire queue_lock     │
    │      running_threads++      │
    │      Notify condition       │
    └─────────────────────────────┘

Threadpool Usage in Execution Engine

The threadpool is instantiated globally in gldcore/exec.cpp and used throughout the simulation cycle.

Threadpool Initialization

Location: exec.cpp, Line 3551

threadpool = new cpp_threadpool(global_threadcount);

This happens once at the start of the execution phase. The global_threadcount variable is set during initialization from command-line arguments or defaults to processor_count().

Object Synchronization Pattern

Location: exec.cpp, Lines 2320-2340

// Distribute object synchronization across threadpool
for (int k = 0; k < n_threads; k++) {
    threadpool->add_job([=] { 
        obj_syncproc(&*thread[n+k]); 
    });
}
threadpool->await();  // Wait for all sync operations to complete

How It Works:

  1. Main thread creates one job per worker thread
  2. Each job calls obj_syncproc() with a segment of objects (thread data structure)
  3. Worker threads execute jobs in parallel, synchronizing their assigned objects
  4. await() blocks main thread until all jobs complete
  5. Main thread continues to next simulation phase

Commit Operation Batching

Location: exec.cpp, Lines 1683-1710

// Batch commit operations across threadpool
for (int k = 0; k < n_threads; k++) {
    threadpool->add_job([=, &obj, &result]() {
        // Commit operations for assigned object segment
        commit_segment(thread[k]);
    });
}
threadpool->await();

Similar pattern: distribute work across threads, wait for completion.

Thread-to-Index Mapping

Thread Data Structure: exec.h

class threadpool_thread_data {
    std::map<std::thread::id, int> thread_map;  // Maps thread ID → thread index

public:
    int get_thread_index() const {
        return thread_map[std::this_thread::get_id()];
    }
};

Synchronization Data:

struct sync_data {
    TIMESTAMP next_event_time;
    int hard_event_count;
    int status;
    // ... other fields
};

Each worker thread has associated sync_data for tracking convergence and event timing.


Configuration & Environment Variables

Global Configuration Variable

Definition: gldcore/globals.cpp, Line 154

{"threadcount", PT_int32, &global_threadcount, 
 PA_PUBLIC, 
 "number of threads to use while using multicore"}

Global Variable:

int global_threadcount = 1;  // Default: single-threaded

Command-Line Interface

Usage:

gridlabd --threadcount N model.glm

Behavior: - N = 0: Auto-detect using std::thread::hardware_concurrency() - N = 1: Single-threaded (no threadpool utilized) - N > 1: Multi-threaded with N worker threads - N < 0: Invalid (error)

Initialization Flow

Location: gldcore/main.cpp, Lines 207-210

if (global_threadcount == 0) {
    global_threadcount = processor_count();
    // On systems with 16 cores: global_threadcount = 16
}
// Output: "using 16 helper thread(s)"

Auto-Detection Logic:

On Linux/Unix: - Calls std::thread::hardware_concurrency() - Falls back to sysconf(_SC_NPROCESSORS_ONLN) if not available

On Windows: - Uses GetSystemInfo() to query processor count

Single-Threaded vs. Multi-Threaded Threshold

Throughout the codebase, modules check threadcount to enable/disable multi-threading:

if (global_threadcount < 2) {
    // Single-threaded path (no thread overhead)
    process_all_objects_sequentially();
} else {
    // Multi-threaded path
    create_worker_threads();
    distribute_work();
    wait_for_completion();
}