Software Design

Architecture

DependaMan is structured as a six-phase pipeline. Each phase has a single responsibility and a well-defined input/output contract. Phases are independent — the analysis passes in Phase 4 know nothing about git; the renderer in Phase 6 knows nothing about the file system.

Phase 1 — File Discovery

Walk the project directory, collect all .py files, and determine the package root. Distinguish internal modules from external ones — stdlib and third-party imports are ignored throughout the pipeline.

Phase 2 — Import Parsing

Use ast to parse each file and extract import and from ... import statements. Resolve relative imports. Filter to internal-only imports. No regex, no string matching — the AST gives exact source structure.

Phase 3 — Graph Construction

Build a directed graph as an adjacency structure:

  • Node = internal module
  • Edge A → B = “module A imports module B”

Attach metadata to each node: file path, line count, function/class count.

Phase 4 — Analysis

Four independent passes over the module graph:

  • Dead code — nodes with no incoming edges and not an entry point
  • Circular imports — DFS-based cycle detection; reports full cycle paths
  • Hotspots — nodes ranked by fan-in (most imported)
  • Coupling — nodes ranked by fan-out (imports the most)

Phase 5 — Git Integration

Use subprocess + git log to compute per-file commit frequency, lines added/removed over time (churn), and last author. Attach this data to graph nodes. Optional — skipped gracefully if the project is not a git repo.

Phase 6 — HTML Output

Generate a self-contained .html file. The HTML template is a static string embedded in Python. Only the data changes between runs — Python serializes the graph to JSON and injects it into the template:

data = json.dumps({"nodes": [...], "edges": [...]})
html = TEMPLATE.replace("__GRAPH_DATA__", data)

The template contains a <script> block that reads the injected data and renders the graph using the browser’s canvas API. No external JS libraries required. Works fully offline.


Callable Graph (v1.1)

A second graph layer built on top of the module graph. Where the module graph answers what depends on what, the callable graph answers who calls what.

Two graphs, one canonical namespace

GraphNodesEdges
Modulepkg.modA imports B
Callablepkg.mod::Func, pkg.mod::Class.methodA calls B

Every callable canonical contains its module canonical as a prefix — split on :: to get the parent module. The two graphs stay independent so all existing analyzers apply to the callable graph unchanged.

Alias mapper

A single local_name → canonical dict capturing all imports, aliased or not:

Import statementlocal_namecanonical
import aaa
import a as bba
from a import bba::b if defined, else a.b
from a import b as ccsame lookup as above
from .x import yyresolved relatively, then same lookup

Call collection

For each top-level FunctionDef, AsyncFunctionDef, and ClassDef, walk the body and emit edges source_canonical → target_canonical. Module-scope calls attribute to a synthetic source mod::<module> so entry-point invocations are not lost. Nested function defs are not separate nodes — calls inside them attribute to the outermost enclosing callable.

References tracked:

  • Direct callsast.Call with ast.Name: look up in alias_mapper
  • Attribute chainsnp.linalg.norm(): walk the chain, resolve root via alias_mapper, reconstruct candidates against project definitions
  • InheritanceClassDef.bases: class B(A) → edge mod::B → mod::A
  • Decoratorsdecorator_list: bare @foo and parameterized @foo(arg) resolved the same way
  • Callable-as-argumentpool.submit(fn), sorted(xs, key=fn): function references passed as arguments emit edges, preventing false dead-code positives

Dead callable detection

A callable is dead when nothing references it:

dead = {
    c for c in project_definitions
    if callable_fanin[c] == 0 and c not in project_imports
}

Concurrency

DependaMan applies different strategies depending on the nature of the work:

Git stats (I/O-bound) — waiting on subprocess calls, not CPU work. ThreadPoolExecutor runs multiple git log calls concurrently.

Parsing (CPU-bound) — pure computation. When the module count is high enough, ProcessPoolExecutor bypasses the GIL and uses multiple cores. On a GIL-less Python build (3.13+ free-threaded), ThreadPoolExecutor is used instead — threads already run truly in parallel.

Both paths include a minimum module threshold before engaging concurrent execution. Spawning a process pool has real startup costs — for small projects, the overhead exceeds the benefit.


Package Structure

dependaman/
    __init__.py        public API: dependaman()
    __main__.py        CLI entry point
    core.py            orchestration
    discovery.py       Phase 1 — file discovery
    parser.py          Phase 2 — import parsing
    graph.py           Phase 3 — graph construction
    analysis.py        Phase 4 — analysis passes
    git.py             Phase 5 — git integration
    renderer.py        Phase 6 — HTML output
    pool.py            GIL-aware executor selection

Design Constraints

Zero external dependencies. The entire tool runs on Python’s standard library (ast, pathlib, json, subprocess, concurrent.futures). Once installed, it works in any environment without pulling in a dependency tree.

Framework-agnostic output. The render function returns an HTML string — it knows nothing about how it will be served:

def render(graph, analysis) -> str: ...

# FastAPI
@app.get("/graph", response_class=HTMLResponse)
def dependency_graph():
    return render(build_graph("."), analyze(graph))

Single-pass parsing. Each module is read and AST-parsed once. The result fans out into project imports, module graph, and callable graph in the same pool round — avoiding repeated I/O and parse overhead.