Skip to content

Writing Good Knowledge Graph Entries

Navigation: Home > Getting Started > Pattern Writing

Best practices for creating valuable lessons, ADRs, and knowledge graph entries


General Principles

1. Write for Future-You

The person most likely to read your documentation is you, 6 months from now. Write as if explaining to someone who knows the project but has forgotten this specific context.

Good:

# Problem
Authentication tokens were expiring during long-running imports (>1 hour),
causing imports to fail halfway through.

Bad:

# Problem
Tokens expired.

2. STAR Format (Situation-Task-Action-Result)

Structure narratives to answer: - Situation: What was the context? - Task: What needed to be done? - Action: What did you do? - Result: What happened?

3. Searchable Keywords

Include terms future-you will search for:

Good:

# Lesson: Database Connection Pool Exhaustion

Keywords: connection pool, postgres, max connections, timeouts,
resource exhaustion, ETIMEDOUT

Future search: grep -r "connection pool" docs/ ✅ finds it

Create a web of knowledge:

## Cross-References
- **Pattern:** [patterns.md#connection-pooling](<../templates/knowledge/patterns.md#connection-pooling>)
- **Related:** [debugging/timeout-debugging.md](<../debugging/timeout-debugging.md>)
- **ADR:** [[ADR-012-postgres-config.md]]

Lessons Learned

Template Structure

# Lesson Learned: [Descriptive Title]

**Date:** YYYY-MM-DD
**Category:** architecture/debugging/process/patterns
**Tags:** #tag1 #tag2

## Problem
[What went wrong? Be specific.]

## Root Cause
[Why did it happen? Dig deep.]

## Solution
[How was it fixed? Step-by-step.]

## Replication
[How to implement this solution in future.]

## Lessons Learned
[Key insights, what worked, what didn't.]

## Cross-References
[Links to related knowledge.]

When to Create

DO create for: - Non-trivial problems (took >1 hour to solve) - Surprising behavior (not obvious from docs) - Complex multi-step solutions - Patterns worth reusing

DON'T create for: - Routine tasks - Obvious solutions - One-line fixes - Standard practices

Writing Tips

Problem Section: - Include symptoms observed - What you tried that didn't work - Error messages (relevant parts)

Root Cause: - Don't just describe fix, explain WHY it was needed - Include debugging steps that revealed cause - Diagram if helpful

Solution: - Step-by-step instructions - Code samples (with context) - Configuration changes (before/after)

Replication: - Make it copy-paste-able for next time - Include prerequisites - Mention gotchas

Example:

## Problem

CI builds failing randomly with "Cannot connect to database" error.
Happened ~20% of builds, no pattern detected. Local builds worked fine.

Tried:
- Restarting CI runner (didn't help)
- Adding retry logic (masked issue, didn't fix)
- Increasing timeout (didn't help)

## Root Cause

CI runners share database connection pool. When multiple builds run
concurrently, they exhaust the connection pool (max 20 connections).

Debugging steps:
1. Checked database logs → saw "too many connections" errors
2. Monitored connection count → spiked during concurrent builds
3. Tested with staggered builds → no failures

## Solution

Increased connection pool size for CI environment:

```javascript
// config/database.ci.js
module.exports = {
  pool: {
    min: 2,
    max: 50,  // was: 20
    acquireTimeoutMillis: 30000
  }
}

Deployed: Monitoring shows pool utilization now max 35/50 during peak.

Replication

For future CI resource issues: 1. Check CI-specific config vs local 2. Monitor resource usage during concurrent runs 3. Scale resources for CI (it's parallel, not sequential)

Lessons Learned

  • CI failures often due to concurrency, not bugs
  • Monitor actual resource usage (don't guess)
  • CI environment needs more resources than local (parallel execution)
    ---
    
    ## Architecture Decision Records (ADRs)
    
    ### Template Structure
    
    ```markdown
    # ADR-NNN: [Decision Title]
    
    **Status:** Accepted/Rejected/Deprecated/Superseded
    **Date:** YYYY-MM-DD
    **Authors:** [Names/roles]
    
    ## Context
    [Forces at play, why decision needed]
    
    ## Options Considered
    [What alternatives were evaluated]
    
    ## Decision
    [What was chosen and why]
    
    ## Consequences
    [Positive and negative outcomes]
    
    ## Related
    [Links to lessons, patterns, other ADRs]
    

When to Create

DO create for: - Architectural choices affecting multiple components - Technology selection (framework, database, library) - Process changes affecting team - Tradeoff decisions with lasting impact

DON'T create for: - Implementation details (use lessons) - Temporary solutions - Obvious choices (no tradeoffs)

Writing Tips

Context: - Explain the problem requiring decision - Describe constraints (technical, organizational, time) - State goals/requirements

Options: - List 2-4 serious alternatives - Pros/cons for each - Why obvious choice might NOT be best

Decision: - Which option chosen - Why this one over others - What values/priorities drove choice

Consequences: - Positive outcomes expected - Negative tradeoffs accepted - Monitoring plan

Example:

# ADR-012: Database Connection Pooling Strategy

**Status:** Accepted
**Date:** 2024-10-15

## Context

Application experiencing database connection exhaustion under load.
Need to decide on connection management strategy.

Constraints:
- PostgreSQL limit: 100 connections
- 5 application instances
- Peak: 200 concurrent requests

## Options Considered

### Option 1: No Pooling (Status Quo)
**Pros:** Simple, no configuration
**Cons:** Connection per request (expensive), exhausts limit
**Verdict:** ❌ Rejected - doesn't scale

### Option 2: Application-Level Pooling
**Pros:** Full control, language-native
**Cons:** Each instance has separate pool (5 * 20 = 100 connections)
**Verdict:** ⚠️ Works but inefficient

### Option 3: PgBouncer (Connection Proxy)
**Pros:** Centralized pooling, 1000+ client connections → 100 DB connections
**Cons:** Additional infrastructure, single point of failure
**Verdict:** ✅ Chosen

## Decision

Use PgBouncer in transaction pooling mode.

**Why:**
- Centralized pool shared across all instances
- Can support 1000+ client connections with 100 DB connections
- Mature, battle-tested solution
- Minimal application changes

**Configuration:**
```ini
[databases]
myapp = host=postgres port=5432

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5

Consequences

Positive: - ✅ Scales to 200+ concurrent requests - ✅ Reduces DB connection overhead - ✅ Centralized monitoring

Negative: - ❌ Additional infrastructure to maintain - ❌ Transaction pooling limits (no prepared statements across requests) - ❌ Requires monitoring PgBouncer health

Mitigations: - Deploy PgBouncer as HA pair (Kubernetes deployment) - Monitor with Prometheus metrics - Document transaction pooling limitations

Review

Revisit if: - Application instances > 20 (may need larger pool) - Prepared statements become requirement - PgBouncer becomes bottleneck

  • Lesson: debugging/connection-pool-exhaustion.md
  • Pattern: patterns.md#centralized-resource-pooling
    ---
    
    ## Knowledge Graph Entries
    
    ### Patterns
    
    ```markdown
    ## Pattern Name
    
    **Problem:** [One-sentence problem description]
    
    **Solution:** [One-sentence solution]
    
    **When to use:** [Trigger conditions]
    
    **Quick Reference:**
    - [Bullet points with key facts]
    
    **Cross-References:**
    - **Lesson:** [[link]]
    - **ADR:** [[link]]
    

Example:

## Connection Pooling Pattern

**Problem:** Creating database connection per request is expensive and doesn't scale

**Solution:** Maintain pool of reusable connections shared across requests

**When to use:** 
- High-throughput applications (>100 req/sec)
- Database connection overhead noticeable
- Connection limit constraints

**Quick Reference:**
- Pool size: 10-20 per application instance
- Idle timeout: 30-60 seconds
- Max wait time: 5-10 seconds
- Monitor: Pool utilization, wait time, errors

**Cross-References:**
- **Lesson:** [debugging/connection-pool-exhaustion.md](<../debugging/connection-pool-exhaustion.md>)
- **ADR:** [[ADR-012-pgbouncer.md]]

Concepts

## Concept Name

**Definition:** [Clear, concise definition]

**Why It Matters:** [Importance/value]

**Key Characteristics:**
- [Properties that define this concept]

**Examples:**
- [Real-world examples]

**Cross-References:**
- [Related concepts, patterns, lessons]

Gotchas

## Gotcha Name

**Symptom:** [What you observe]

**Gotcha:** [What's actually happening]

**Fix:** [How to solve]

**Why:** [Root cause explanation]

**Cross-References:**
- [Related lessons, patterns]

Common Mistakes

1. Too Abstract

Bad:

# Lesson: Performance Optimization
We optimized the system and it got faster.

Good:

# Lesson: Database Query N+1 Problem
Reduced API response time from 5s to 300ms by converting
15 sequential queries to single JOIN query.

2. Missing Context

Bad:

Changed max_connections to 100.

Good:

Increased PostgreSQL max_connections from 50 to 100.

**Why:** 5 app instances * 20 pool size = 100 connections needed.
**Impact:** Eliminated "too many connections" errors.

3. No Replication Steps

Bad:

Fixed the bug by adding validation.

Good:

**Fix:**
1. Add validation schema in `config/validation.js`
2. Import in request handler: `const { validate } = require('./validation')`
3. Call before processing: `if (!validate(req.body)) return 400`
4. Add test in `tests/validation.test.js`

4. Weak Cross-References

Bad:

Related to some other docs.

Good:

## Cross-References
- **Pattern:** [patterns.md#input-validation](<../templates/knowledge/patterns.md#input-validation>)
- **Gotcha:** [gotchas.md#validation-bypass](<../templates/knowledge/gotchas.md#validation-bypass>)
- **Related Lesson:** [debugging/validation-regression.md](<../debugging/validation-regression.md>)
- **ADR:** [[ADR-008-validation-strategy.md]]


Review Checklist

Before finalizing a knowledge entry:

Content: - [ ] Title is descriptive and searchable - [ ] Problem/context clearly explained - [ ] Solution includes enough detail to replicate - [ ] Key insights explicitly stated - [ ] Examples included (code, config, diagrams)

Structure: - [ ] Follows template for document type - [ ] Sections in logical order - [ ] Headers used for navigation - [ ] Lists/bullets for scanability

Connections: - [ ] Cross-references to related knowledge - [ ] Keywords/tags for search - [ ] Category appropriate - [ ] Links use correct format

Quality: - [ ] Written for future-you (6 months later) - [ ] Specific enough to be useful - [ ] Concise enough to be readable - [ ] Free of sensitive data (see SANITIZATION-CHECKLIST)


MEMORY.md System

What is MEMORY.md?

A special file that syncs your most important patterns to Claude's persistent memory.

Location: /MEMORY.md in your knowledge graph root

How It Works

  1. You capture lessons and patterns over time
  2. Run /kmgraph:update-graph (extracts patterns from lessons)
  3. Top patterns written to MEMORY.md
  4. Claude reads MEMORY.md at session start
  5. Claude "remembers" your patterns automatically

MEMORY.md Lifecycle

Active (MEMORY.md): - Current, frequently-used patterns - Target: <200 lines (keeps Claude context efficient) - Updated by /kmgraph:update-graph

Archived (MEMORY-archive.md): - Older patterns, still valid but less used - Moved by /kmgraph:archive-memory - Retrievable via /kmgraph:restore-memory

Managing MEMORY.md

When to archive: - MEMORY.md exceeds 200 lines - Patterns become less relevant over time - New project phase (archive old patterns)

When to restore: - Need archived pattern again - Returning to previous project phase - Team member needs historical context

Commands: - /kmgraph:archive-memory - Move entries to archive - /kmgraph:restore-memory - Bring back archived entries

See Command Guide for details.


Resources: - Templates - Starting scaffolds - Examples - Real-world samples - Manual Workflows - Manual processes - Architecture - System design