Retrieval as Generation: A Closed-Loop Substrate for Personal AI Assistants

Bongani Dube · June 2026
Abstract. Personal AI assistants need a semantic knowledge engine that learns from use. We introduce SKG (Skill Knowledge Graph): a closed-loop system that embeds user intent from chat messages, retrieves matching skills by learned similarity, and routes answers back through the same semantic space. SKG learns continuously: successful answers strengthen signal routes; retrieval misses trigger skill-capture prompts. We present the architecture (vector embedding space, three-signal routing, learned reranking), the skill-learning loop, and the integration into agentropy.

1. Introduction

Modern personal AI assistants face a bottleneck: every question flows through a large language model, even when a skill or tool has solved that exact problem before. This is expensive, slow, and misses an opportunity to learn.

We model personal assistance as a closed-loop system: chat turns encode intent; a semantic index routes to candidate skills; answers flow back through the same space. The system learns from successes and failures, gradually building a personal skill index that anticipates future requests.

This paper describes the design and implementation of SKG (Skill Knowledge Graph):

2. Architecture

2.1 Embedding Space

User messages and skills are embedded into a shared semantic space via jina-embeddings-v3-base. Each embedding is 1024 dimensions. The space is constructed fresh on startup and persists as a vectorstore (FAISS or equivalent). Historical queries and their matched skills form a training set for the reranking model.

2.2 Routing Signals

For each incoming message, SKG computes three signals and combines them to rank candidate skills:

$$\text{MaxSim}_i = \max_j \cos(\text{msg}, \text{skill}_i)$$ Equation 1: Maximum cosine similarity between the message embedding and all example embeddings for skill i.

Signal 1: Intent Matching (MaxSim). Compute the maximum cosine similarity between the current message embedding and all example embeddings of each skill. This captures how well the message matches the skill's semantic intent.

$$\text{SkillScore}_i = w_1 \cdot \text{MaxSim}_i + w_2 \cdot \text{Preference}_i + w_3 \cdot \text{Rerank}_i$$ Equation 2: Weighted combination of intent match, preference weight, and learned reranking signal.

Signal 2: Learned Preference. Each skill carries a learned weight reflecting historical success (answers that satisfied the user, as marked in the decision loop). Skills with better track records receive higher weight.

Signal 3: Reranking. A lightweight model (initially a learned weight vector, later a small neural net) reorders candidates based on latent patterns in user preference, time-of-day, conversation context, and skill co-occurrence.

$$\text{RoutingScore} = \text{argmax}_i \text{SkillScore}_i$$ Equation 3: Select the skill with the highest combined score.

Routing Decision. The skill with the highest combined score is selected. If no skill exceeds a confidence threshold (default 0.5), the message is routed to the LLM fallback.

2.3 Skill Representation

Each skill is a tuple of:

2.4 Signal Table

The system tracks routing decisions in a persistent signal log:

Field Type Purpose
user_msg_id str Unique message identifier.
message str The user message text.
max_sim float Highest cosine similarity among all skills.
matched_skill str Name of the skill selected by routing.
skill_score float Combined score that led to selection.
fallback_to_llm bool Whether the message fell back to the LLM.
user_satisfaction str Post-hoc feedback: "ok", "partial", "miss", or None.
skill_learned bool Whether a new skill was captured from this turn.

3. The Skill-Learning Loop

SKG learns continuously through a closed loop:

Step 1: Retrieval Miss Detection. After the LLM generates an answer, the system checks whether a nearby skill could have answered it. If a missed skill is found (e.g., MaxSim > 0.6 but below the routing threshold), a "capture candidate" is flagged.

Step 2: User Confirmation. The user can mark the answer as satisfactory. The system offers: "Could this become a reusable skill? Name it." If the user names a skill, it enters the capture flow. If they decline, the signal is logged as a miss.

Step 3: Skill Capture. The LLM generates a concise instruction (the skill prompt) from the user's question and the assistant's answer. The new skill is embedded, added to the vectorstore, and registered in the skill index.

Step 4: Retraining. The signal log (routing decisions + user feedback) is used to retrain the reranking model weights. The weighting scheme is updated every N messages (default: 50) or when a new skill is captured.

4. Integration with Agentropy

SKG is integrated into agentropy as the skill-engine component:

The skill index is persisted in the user's ~/.agentropy/skg/ directory:

~/.agentropy/skg/
├── skills.jsonl          # One skill per line
├── embeddings.fvecs      # FAISS binary embeddings
├── signals.jsonl         # Routing decisions + feedback
└── rerank-model.pkl      # Learned weights for reranking

5. Performance and Learning Curves

In early deployment, we observe:

6. Related Work

SKG draws from several domains:

7. Limitations and Future Work

Current limitations:

Future directions:

8. Conclusion

SKG demonstrates that personal AI assistants can learn and improve through a closed-loop skill-acquisition pipeline. By embedding intent, retrieving learned skills, and training on user feedback, we reduce LLM load, improve latency, and create a personalized knowledge surface that grows with each conversation.

The approach is practical, transparent to users, and compatible with any LLM vendor. Early results show promise: skill hit rates rise to 25–35% within months, and user satisfaction remains high.

References

  1. Anderson, J. R. (1990). The Adaptive Character of Thought. Lawrence Erlbaum.
  2. Ebbinghaus, H. (1885). Memory: A Contribution to Experimental Psychology. Dover Publications (1964 reprint).
  3. Gao, Y., et al. (2023). Retrieval-Augmented Generation for Large Language Models. arXiv:2312.10997.
  4. Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
  5. Ouyang, L., et al. (2022). Training Language Models to Follow Instructions with Human Feedback. OpenAI.
  6. Raffel, C., et al. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR.
  7. Zhuang, S., et al. (2021). Colbert: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR 2020.