Published noteApr 15, 2026

Public note

Graphify: A Multi-Modal Knowledge Graph Builder for AI Coding Assistants

AI summary

Graphify is an open-source project that builds multi-modal knowledge graphs from code, documentation, and diagrams for AI coding assistants to understand software projects more comprehensively.

AI tags

graphify

Subheadline

Graphify is an open-source project that turns code, documentation, papers, and diagrams into a queryable knowledge graph so AI coding assistants can understand a software project with more structural and architectural context.

Lead

Graphify is not a generic graph-visualization tool. It is a multi-modal knowledge graph builder for software projects, designed to help tools such as Claude Code, OpenAI Codex, and OpenCode understand codebases with more structural and architectural context. According to the official project site, Graphify combines Tree-sitter-based static analysis with LLM-driven semantic extraction to map not just what the code does, but also the design intent expressed in docs, papers, and diagrams.

At a Glance

Project: Graphify
Type: Open-source knowledge graph skill for AI coding assistants
Maintainer: Safi Shamsi
License: MIT
Primary purpose: Build a queryable graph from code, docs, papers, and diagrams
Core stack called out by the project: Tree-sitter, NetworkX, Leiden clustering
Outputs: graph.html, graph.json, and GRAPH_REPORT.md

What Happened

Graphify presents itself as an infrastructure layer for code understanding rather than a standalone visualization app. Its pipeline collects project files, extracts structural and semantic signals, merges them into a graph, clusters related concepts into communities, and then exports both machine-readable and human-readable outputs.

On the project site, Graphify says it supports AI coding assistants including Claude Code, OpenAI Codex, and OpenCode. The GitHub repository describes the project more broadly as a skill that can turn folders containing code, docs, papers, images, or videos into a queryable knowledge graph.

Key Facts / Comparison

Item	What the sources say
Identity	Open-source knowledge graph skill for AI coding assistants
Input	Code, docs, papers, diagrams, and other project materials
Static analysis	Tree-sitter-based AST extraction
Graph engine	NetworkX
Clustering	Leiden community detection without vector embeddings
Main promise	Structural and semantic understanding of codebases
Output artifacts	Interactive HTML graph, JSON graph, Markdown report

Background and Context

Many code-intelligence systems depend heavily on chunking, embedding, and vector retrieval. Graphify argues for a different approach: preserve explicit structure first, then add semantic links on top. In its own documentation, the project positions knowledge graphs as more transparent for code understanding because they can preserve calls, imports, rationale comments, and cross-modal links between source code and external design materials.

That positioning matters because software repositories increasingly include more than source files. Architecture diagrams, technical notes, PDFs, and research papers often carry the rationale behind a system. Graphify is built around the idea that an AI assistant should be able to reason across all of them in one graph.

Why This Matters

The most interesting part of Graphify is that it tries to answer a persistent weakness in code-assistant workflows: source code alone often does not explain architectural intent. Graphify’s pipeline is designed to preserve both structure and meaning.

For teams using AI coding assistants, that can matter in several practical ways:

It may reduce the need to repeatedly stuff large repositories into context windows.
It gives assistants an explicit graph of relationships instead of only similarity scores.
It offers a bridge between implementation details in code and intent captured in supporting documents.

The project site also claims large token reductions when querying graph-derived subgraphs instead of naively sending large corpora to a model.

Insight and Industry Analysis

Graphify sits in a distinct position relative to nearby tool categories.

It is not a graph database like Neo4j, because its main job is to generate a graph from project artifacts.
It is not a typical vector-RAG code search tool, because it emphasizes explicit structural edges and graph topology.
It is not just a visualizer, because the graph is intended to be queried by coding assistants and exported as structured artifacts.

That makes Graphify best understood as a codebase interpretation layer. Its real value is not the picture of the graph itself, but the intermediate representation it builds for downstream assistant workflows.

Strengths, Limitations, and Open Questions

Strengths

Preserves explicit software structure through AST parsing
Incorporates docs, papers, and diagrams into the same graph
Uses transparent graph edges and community structure instead of relying only on embeddings
Produces multiple outputs for inspection, querying, and reporting

Limitations

The official materials emphasize the architecture and workflow, but do not provide exhaustive benchmark coverage across many repositories
Accuracy of semantic extraction still depends on model quality for non-code materials
Some claims, such as token-efficiency gains, are demonstrated with project examples rather than independent third-party evaluation

Open Questions

How consistently does the semantic layer perform across very large enterprise repositories?
How much manual review is needed when inferred cross-modal links are ambiguous?
How stable are community assignments when repositories evolve rapidly?

Technical Deep Dive

1. Deterministic Tree-sitter AST pass

Graphify’s first pass is a deterministic Tree-sitter walk over every code file it finds. The project’s Tree-sitter documentation explicitly says this stage uses no LLM, no embeddings, and no network calls. Instead, it converts AST information directly into graph nodes and edges.

From that AST pass, Graphify extracts:

Structural nodes such as classes, functions, methods, modules, traits/interfaces, and top-level variables
Call-graph edges, where resolved call sites become calls edges
Import edges, which help connect modules that belong together
Rationale nodes, where docstrings and specially marked comments such as NOTE, IMPORTANT, HACK, and WHY are lifted into separate nodes

This is a notable design choice. Graphify is not only mapping software structure, but also trying to preserve why the code was written a certain way when that rationale appears in comments or docstrings.

2. Semantic extraction for non-code materials

The second major layer is semantic extraction over materials such as docs, papers, and diagrams. On its homepage, Graphify says LLMs extract concepts from prose and vision models read diagrams. These semantic outputs are then merged into the same graph as the deterministic code structure.

The Tree-sitter documentation also describes a source-tracking model for graph edges:

EXTRACTED for deterministic facts from code, with confidence 1.0
INFERRED for model-derived links
AMBIGUOUS for cases where the model is uncertain

That separation is important because it gives reviewers a way to distinguish directly observed structure from model judgment.

3. Graph construction with NetworkX

After extraction, Graphify merges nodes and edges into a NetworkX graph. The official site describes this as the stage where static analysis and semantic extraction are unified into a single graph representation.

This combined graph is the project’s core data structure. It allows Graphify to represent:

code-to-code relationships such as calls and imports
code-to-doc links
code-to-paper links
code-to-diagram links
concept-to-concept semantic relations

In other words, Graphify is building a shared relational layer across implementation artifacts and explanatory artifacts.

4. Leiden clustering without embeddings

One of Graphify’s clearest architectural positions is its use of Leiden community detection without vector embeddings. The project’s Leiden documentation says Graphify runs Leiden directly on graph topology, rather than clustering embedded chunks.

The rationale given by the project is that embedding-driven pipelines can lose explicit structure, produce opaque groupings, and add extra infrastructure such as vector stores. Graphify instead uses edge density in the graph as the clustering signal.

The same documentation says semantic similarity can still participate through inferred edges like semantically_similar_to. Those inferred edges live in the same graph as structural edges, so Leiden can cluster around both architectural structure and conceptual affinity.

5. God nodes, surprise edges, and graph-guided querying

The project site says Graphify identifies high-degree “god nodes” at the center of a system and flags unexpected cross-file or cross-domain connections as surprises. In the Leiden documentation, these surprises are described as edges ranked highly by a composite method, with cross-domain links treated as especially interesting.

Graphify also frames the graph as a token-efficiency mechanism. Instead of passing a full codebase into a model, the assistant can query a relevant subgraph and work from a compressed structural summary.

What to Watch Next

Based on the current materials, the next things worth watching are:

broader validation across more repositories and programming stacks
how well the semantic layer handles noisy or incomplete documentation
whether graph-guided workflows become a more common alternative to embedding-heavy code RAG
deeper integrations with assistant environments beyond the currently highlighted tools

Conclusion

Graphify is best understood as a multi-modal codebase knowledge graph builder, not a generic visualization package. Its architecture combines deterministic Tree-sitter parsing, semantic extraction over non-code materials, graph construction in NetworkX, and Leiden clustering over topology rather than embeddings.

That combination makes it an interesting project in the evolving code-intelligence stack. The strongest idea behind Graphify is not merely that it draws graphs, but that it gives AI coding assistants a more inspectable and structured representation of how a software project is built, documented, and explained.