14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
-
Updated
Apr 1, 2026 - Python
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
97% token reduction for AI coding sessions — zero deps, 31 languages, MCP server
Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucination guard). Drop-in for Cursor, Claude Code, Codex, Aider + 34 more and custom providers — 30s, no code changes
Portable CC-inspired skills for memory, verification, multi-agent coordination, context compression, and proactive coding-agent workflows.
The official repo for "LLoCo: Learning Long Contexts Offline"
A drop-in proxy that compresses bloated code context in real-time, cutting LLM API costs by 50–80% without losing what the model actually needs to know.
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
Context compression plugin for Claude Code. Trims large JSON, logs, stack traces, and source files before they enter the context window.
State aware knowledge compression, ingestion, and hybrid retrieval engine. Zero dependencies. Sub-100ms queries.
Cursor uses AI to edit code — we use AI to edit AI's context. 🪆 Context map + compression + version control for LLM context windows.
Unified agent memory and context compression stack for 2026 NVIDIA + edge (Vera CPU, Grace, Jetson Thor, 3090). Glues busyBee-cpu, honey-comb, and rust-brain. Better effective reasoning per token.
LLM reliability layer -keeps agents alive with smart routing, context compaction, and local fallback
🦞 LobsterPress(龙虾饼) - Cognitive Memory System for AI Agents 基于认知科学的 LLM 永久记忆引擎
Convert long AI conversations into portable conversation state graphs for LLM handoffs.
A unified CLI to install and update token-saving plugins — RTK, Caveman, CodeGraph, and Context-Mode — for Claude Code, OpenCode, Codex, and Antigravity.
Local-first Model Context Protocol (MCP) memory layer for Codex CLI/Desktop, Claude Code, Gemini CLI, Qwen/DeepSeek/Ollama and agent workflows. SQLite + FTS5 compact context packs, token savings, read-only mode, no external memory server.
��� Cut Claude token usage by 90%+ — free, open-source, local-first context compression for Claude Code. Hybrid RAG (BM25 + ONNX vectors), AST chunking, reranking. No API needed.
Rolling context compression for Claude Code — never hit the context wall. Auto-compresses old messages while keeping recent context verbatim. Zero config, zero latency. Works as a Claude Code plugin.
Squeeze verbose LLM agent tool output down to only the relevant lines
Auditable context capsules for LLM handoffs, coding agents, and OpenCode MCP workflows.
Add a description, image, and links to the context-compression topic page so that developers can more easily learn about it.
To associate your repository with the context-compression topic, visit your repo's landing page and select "manage topics."