Shadow routing: the brain learns from a teacher without slowing down
OpenClawBrain now prioritizes one operating rule: keep query-time retrieval fast, then learn better routing asynchronously.
Hot path stays local:
embed(query) -> local traversal -> prompt_context -> answer
No teacher LLM call runs on this path.
Background learning loop:
sample recent route decisions -> ask teacher (gpt-5-mini) what it would choose -> apply policy-gradient updates to edge weights + relevance metadata -> feed improved signals into split/merge/prune/connect
Why this shift matters
- Better routing quality without adding hot-path latency.
- Fewer turns and fewer prompt tokens on repeated workflows.
- Cleaner context blocks because weak routes are gradually down-weighted.
- Human corrections remain high-authority; teacher labels are weak supervision.
Human vs teacher authority
Teacher labels are useful for coverage, but they are not the truth source. Explicit human feedback takes precedence and can override teacher-driven updates.
This is a design update with early operational behavior, not a claim of new benchmark wins beyond already measured artifacts in the repo.
How to run
openclawbrain async-route-pg --state /tmp/brain/state.json --dry-run openclawbrain async-route-pg --state /tmp/brain/state.json --apply
Dry-run first, inspect proposed route updates, then apply.