Skip to main content
PaperDalhousie University — BCS Honours Thesis2023

Reinforced Linear Genetic Programming

Using Q-Learning to automate register-action assignments in LGP

Overview

Proposes Reinforced Linear Genetic Programming (RLGP), a novel hybrid that layers Q-Learning on top of LGP to learn optimal register-action assignments — eliminating manual domain-specific mapping. Evaluated on OpenAI Gym CartPole-v1 and MountainCar-v0 using a custom Rust framework. LGP achieved mean reward 454 on CartPole; RLGP solved the task but plateaued at 213, revealing a fundamental tension between evolutionary instability and RL convergence. Published on arXiv (2601.09736) and archived at Dalhousie University.

Motivation

Linear Genetic Programming (LGP) evolves sequences of register-machine instructions to solve tasks — but assigning registers to actions in reinforcement learning environments has traditionally required manual, domain-specific mappings. This thesis asks whether Q-Learning can automate that process, letting the system learn which register outputs correspond to which actions.

Approach

RLGP layers a Q-Learning agent on top of LGP. The evolutionary process handles program structure — selecting, crossing over, and mutating instruction sequences — while the Q-Learning layer learns to map register states to environment actions during evaluation. The two mechanisms operate at different timescales: evolution across generations, reinforcement learning within each episode.

The system was evaluated on two OpenAI Gym benchmarks:

  • CartPole-v1 — balance a pole on a moving cart. LGP alone achieved a mean reward of 454. RLGP solved the task but plateaued at 213, suggesting the exploration-exploitation balance needs tuning.
  • MountainCar-v0 — drive an underpowered car up a hill. Both approaches struggled with the sparse reward signal, a known challenge for this environment.

Results

The hybrid approach demonstrates that automated register-action mapping is feasible. LGP’s evolved programs can serve as feature extractors for the Q-Learning layer, and the two learning mechanisms don’t destructively interfere. However, the Q-Learning component introduces additional hyperparameters (learning rate, discount factor, epsilon decay) that interact with LGP’s evolutionary parameters, making the combined search space harder to navigate.

Key Takeaway

The early plateau in RLGP’s CartPole performance points to a fundamental tension: the Q-Learning agent needs stable state representations to learn effectively, but evolution continuously changes the programs producing those representations. Freezing the evolutionary process periodically to let the RL layer converge, or using more robust RL algorithms that handle non-stationary environments, are promising directions.

Demo

Iris baseline experiment results

Baseline

Iris crossover experiment results

Crossover

Iris mutation experiment results

Mutation

Iris full experiment results

Full Pipeline

Features

Genetic Operators

Crossover, mutation, and selection operators for evolving register-machine program populations.

Benchmark Datasets

Built-in support for Iris classification and OpenAI Gym reinforcement learning benchmarks.

Experiment Tracking

Fitness, diversity, and convergence metrics tracked across all evolutionary generations.

Rust Performance

Core evolution engine written in Rust for maximum computational throughput.

Python Analysis

Python scripting layer for experiment visualization, plotting, and statistical analysis.

Configurable Runs

Fine-tune population size, mutation rates, and tournament selection parameters via TOML.

Tech Stack

Rust
Rust
Python
Python