Software Design
Architecture
DependaMan is structured as a six-phase pipeline. Each phase has a single responsibility and a well-defined input/output contract. Phases are independent — the analysis passes in Phase 4 know nothing about git; the renderer in Phase 6 knows nothing about the file system.
Phase 1 — File Discovery
Walk the project directory, collect all .py files, and determine the package
root. Distinguish internal modules from external ones — stdlib and third-party
imports are ignored throughout the pipeline.
Phase 2 — Import Parsing
Use ast to parse each file and extract import and from ... import
statements. Resolve relative imports. Filter to internal-only imports. No regex,
no string matching — the AST gives exact source structure.
Phase 3 — Graph Construction
Build a directed graph as an adjacency structure:
- Node = internal module
- Edge A → B = “module A imports module B”
Attach metadata to each node: file path, line count, function/class count.
Phase 4 — Analysis
Four independent passes over the module graph:
- Dead code — nodes with no incoming edges and not an entry point
- Circular imports — DFS-based cycle detection; reports full cycle paths
- Hotspots — nodes ranked by fan-in (most imported)
- Coupling — nodes ranked by fan-out (imports the most)
Phase 5 — Git Integration
Use subprocess + git log to compute per-file commit frequency, lines
added/removed over time (churn), and last author. Attach this data to graph
nodes. Optional — skipped gracefully if the project is not a git repo.
Phase 6 — HTML Output
Generate a self-contained .html file. The HTML template is a static string
embedded in Python. Only the data changes between runs — Python serializes the
graph to JSON and injects it into the template:
data = json.dumps({"nodes": [...], "edges": [...]})
html = TEMPLATE.replace("__GRAPH_DATA__", data)
The template contains a <script> block that reads the injected data and
renders the graph using the browser’s canvas API. No external JS libraries
required. Works fully offline.
Callable Graph (v1.1)
A second graph layer built on top of the module graph. Where the module graph answers what depends on what, the callable graph answers who calls what.
Two graphs, one canonical namespace
| Graph | Nodes | Edges |
|---|---|---|
| Module | pkg.mod | A imports B |
| Callable | pkg.mod::Func, pkg.mod::Class.method | A calls B |
Every callable canonical contains its module canonical as a prefix — split on
:: to get the parent module. The two graphs stay independent so all existing
analyzers apply to the callable graph unchanged.
Alias mapper
A single local_name → canonical dict capturing all imports, aliased or not:
| Import statement | local_name | canonical |
|---|---|---|
import a | a | a |
import a as b | b | a |
from a import b | b | a::b if defined, else a.b |
from a import b as c | c | same lookup as above |
from .x import y | y | resolved relatively, then same lookup |
Call collection
For each top-level FunctionDef, AsyncFunctionDef, and ClassDef, walk the
body and emit edges source_canonical → target_canonical. Module-scope calls
attribute to a synthetic source mod::<module> so entry-point invocations are
not lost. Nested function defs are not separate nodes — calls inside them
attribute to the outermost enclosing callable.
References tracked:
- Direct calls —
ast.Callwithast.Name: look up in alias_mapper - Attribute chains —
np.linalg.norm(): walk the chain, resolve root via alias_mapper, reconstruct candidates against project definitions - Inheritance —
ClassDef.bases:class B(A)→ edgemod::B → mod::A - Decorators —
decorator_list: bare@fooand parameterized@foo(arg)resolved the same way - Callable-as-argument —
pool.submit(fn),sorted(xs, key=fn): function references passed as arguments emit edges, preventing false dead-code positives
Dead callable detection
A callable is dead when nothing references it:
dead = {
c for c in project_definitions
if callable_fanin[c] == 0 and c not in project_imports
}
Concurrency
DependaMan applies different strategies depending on the nature of the work:
Git stats (I/O-bound) — waiting on subprocess calls, not CPU work.
ThreadPoolExecutor runs multiple git log calls concurrently.
Parsing (CPU-bound) — pure computation. When the module count is high
enough, ProcessPoolExecutor bypasses the GIL and uses multiple cores. On a
GIL-less Python build (3.13+ free-threaded), ThreadPoolExecutor is used
instead — threads already run truly in parallel.
Both paths include a minimum module threshold before engaging concurrent execution. Spawning a process pool has real startup costs — for small projects, the overhead exceeds the benefit.
Package Structure
dependaman/
__init__.py public API: dependaman()
__main__.py CLI entry point
core.py orchestration
discovery.py Phase 1 — file discovery
parser.py Phase 2 — import parsing
graph.py Phase 3 — graph construction
analysis.py Phase 4 — analysis passes
git.py Phase 5 — git integration
renderer.py Phase 6 — HTML output
pool.py GIL-aware executor selection
Design Constraints
Zero external dependencies. The entire tool runs on Python’s standard
library (ast, pathlib, json, subprocess, concurrent.futures). Once
installed, it works in any environment without pulling in a dependency tree.
Framework-agnostic output. The render function returns an HTML string — it knows nothing about how it will be served:
def render(graph, analysis) -> str: ...
# FastAPI
@app.get("/graph", response_class=HTMLResponse)
def dependency_graph():
return render(build_graph("."), analyze(graph))
Single-pass parsing. Each module is read and AST-parsed once. The result fans out into project imports, module graph, and callable graph in the same pool round — avoiding repeated I/O and parse overhead.