We were doing compliance work for a client. As part of that, we'd built a code ontology in Neo4j—a graph database that maps how their codebase fits together. Controllers, models, data flows, external integrations, the works.
When it came time to present to auditors, I was excited. "Look at this!" I said, pulling up a network visualization. Nodes everywhere. Lines connecting them. A beautiful web of relationships.
The auditors smiled politely. Then one asked: "Could you show us a flow diagram instead? We need to see how data moves from entry point to storage to external services."
That's when it clicked: graph databases are for discovery. Flow diagrams are for documentation.
What Graphs Are Good At
Graph databases excel at answering questions like:
- What's connected to what?
- Where are the hubs? (Which entities have the most connections?)
- What's the path between A and B?
- What would be affected if I changed X?
These are discovery questions. You're exploring. You don't know exactly what you're looking for—you're finding it.
The query language (Cypher, in Neo4j's case) is built for this:
// Find all controllers that touch PII and call external services
MATCH (c:Controller)-[:USES]->(m:Model)-[:CONTAINS]->(f:Field {pii: true})
MATCH (c)-[:CALLS]->(ext:ExternalService)
RETURN c.name, collect(distinct m.name), collect(distinct ext.name)
Try doing that with SQL joins. You'd need to know the exact relationship path in advance. With graphs, you describe the pattern and let the database find matches.
What Graphs Are Bad At
Network visualizations—those beautiful hairball diagrams—are terrible for communication.
When an auditor looks at a compliance document, they want to follow a story. "Data enters here, gets validated here, is stored here, and is sent to these external services." A linear narrative.
A network graph doesn't tell stories. It shows everything at once. Which is powerful for discovery ("I didn't know those two systems were connected!") but useless for documentation ("Show me the PCI data flow").
The Fix: Query the Graph, Render as Flow
The solution is to separate storage from presentation.
Store in graphs. This preserves the rich relationships. You can query from any angle. New questions don't require restructuring your data.
Render as flows. For documentation, audits, and communication, take your query results and present them as linear flow diagrams. Entry points on the left, storage in the middle, external services on the right.
The graph knows that UserController connects to User model which has email (PII) which flows to Mailchimp. The flow diagram shows that as a clean left-to-right progression.
Same data. Different presentation for different purposes.
The Kevin Bacon Analogy
The best way I've found to explain graph databases to non-technical people: "Six Degrees of Kevin Bacon."
A relational database is good at answering: "Who is Kevin Bacon?" It'll give you his bio, filmography, personal details.
A graph database is good at answering: "How is this other actor connected to Kevin Bacon, and through which movies?"
That's the difference. Graphs answer "through what?" questions. They find paths. They trace connections. They reveal relationships you didn't explicitly model.
But if you ask a graph database to document Kevin Bacon's career, you'll get a mess. That's not what it's for.
When Graphs Become Essential
A common objection: "I can do all of this with grep and some shell scripts."
At a few thousand entities, yes. Grep works. Text search works. You can hold the system in your head.
The graph becomes essential when:
- Multi-hop queries: Finding all paths between A and B through any number of intermediate nodes
- Cross-system correlation: Connecting entities across different data sources (code + database schema + external APIs)
- Drift detection: Comparing the graph over time to see what changed
- Impact analysis: "If I change this interface, what breaks?"
These are the "impossible without graph" moments. They come at scale, with complex queries, when grep starts returning 10,000 results and you can't tell which ones matter.
Lessons Learned
- Graphs are query engines, not presentation layers. Don't show the raw graph to stakeholders. Query it, then format results appropriately.
- Network visualizations are for you, not them. Use them during exploration. Switch to flow diagrams for communication.
- The value is in the queries, not the storage. A graph database with bad queries is just an expensive document store. The magic is in asking the right questions.
- Start with the questions. Don't build a graph and then wonder what to do with it. Start with "What do we need to find out?" and work backwards.
Graph databases are powerful discovery tools. They answer "how is this connected to that?" in ways relational databases can't. But for documentation and communication, query the graph and render the results as clean flow diagrams. The graph stores knowledge. The diagram communicates it.