Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Networks
Abstract
Efficient and scalable quantum key scheduling remains a critical challenge in trusted-relay Quantum Key Distribution (QKD) networks due to imbalanced key resource utilization, dynamic key consumption, and topology-induced congestion. This paper presents a Q-learning-based adaptive routing framework designed to optimize quantum key delivery in dynamic QKD networks. The model formulates routing as a Markov Decision Process, with a compact state representation that combines the current node, destination node, and discretized key occupancy levels. Reward function is designed to jointly penalize resource imbalance and rapid key depletion while promoting traversal through links with sustainable key generation, guiding the agent toward balanced and congestion-aware decisions. Compared to Dijkstra, the Q-learning scheduler achieves significantly lower delivery latency, more effective key resource utilization, and greater reliability: maintaining 45.8 vs 118.5 hops in average delay under heavy load, sustaining 74.5\% vs 61.5\% in average key utilization, and reducing the failure ratio to 7.2\% vs 41.5\%, collectively confirming its advantages in scalability, congestion resilience, and resource-efficient decision making.
Related articles
Related articles are currently not available for this article.