Debugging Port Conflicts — Port Daddy Tutorials

It's 2am. You're deploying a hotfix. The staging server won't start. Your terminal screams at you in red:

$ npm run dev
Error: listen EADDRINUSE: address already in use :::3100
    at Server.setupListenHandle [as _listen2] (net.js:1318:16)
    at listenInCluster (net.js:1366:12)

The old you reaches for lsof:

$ lsof -i :3100
COMMAND   PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
node    48291  erich   23u  IPv6 0x1a2b3c...   0t0    TCP *:3100 (LISTEN)

# What is PID 48291? When did it start? Which project?
# Is it safe to kill? Is anyone depending on it?
# You have no idea.

$ pkill -f node
# You just killed 7 unrelated Node processes.
# Your VS Code extensions are crashing.
# Your other terminal windows are blank.
# Your colleague's API server that was proxying through your machine is down.

There is a better way.

The 2am Nightmare (In Detail)

The EADDRINUSE error is just the symptom. The real problems are:

You don't know what's listening -- lsof shows PIDs, not purpose
You don't know why it's listening -- Is it a forgotten dev server? A stale process? A different project?
You don't know when it started -- Was it 5 minutes ago or 5 days ago?
You don't know who started it -- Was it you? An agent? A CI runner?
You can't safely kill it -- Without knowing the above, kill is a dice roll

Port Daddy answers all five questions instantly.

The Port Daddy Solution

When every service claims its port through Port Daddy, you get a complete registry of what's running, why, and when it started. Instead of lsof forensics, you ask Port Daddy:

$ pd find :3100
myapp:api  port=3100  claimed 2h ago  healthy

One line. You know the project (myapp), the stack (api), how long it's been running, and whether it's healthy. No PIDs, no guessing, no collateral damage.

Let's walk through every debugging tool Port Daddy gives you.

Understanding What's Listening: `pd find`

The find command is your first stop. It searches the service registry by name pattern.

List All Services

$ pd find *
myapp:api          port=3100  claimed 2h ago
myapp:web          port=3101  claimed 2h ago
myapp:worker       port=3102  claimed 45m ago
dashboard:next     port=3200  claimed 6h ago
blog:gatsby        port=3300  claimed 3d ago

Pattern Matching

# Find all services for a specific project
$ pd find myapp:*
myapp:api          port=3100  claimed 2h ago
myapp:web          port=3101  claimed 2h ago
myapp:worker       port=3102  claimed 45m ago

# Find all API services across projects
$ pd find *:api
myapp:api          port=3100  claimed 2h ago
dashboard:api      port=3201  claimed 1d ago

# Find a specific port
$ pd find :3100
myapp:api          port=3100  claimed 2h ago

JSON Output for Scripting

$ pd find myapp:* --json
[
  {"id":"myapp:api","port":3100,"claimedAt":"2026-03-01T00:15:00Z"},
  {"id":"myapp:web","port":3101,"claimedAt":"2026-03-01T00:15:02Z"},
  {"id":"myapp:worker","port":3102,"claimedAt":"2026-03-01T01:30:00Z"}
]

Full System Status: `pd status`

For a bird's eye view of everything Port Daddy is managing:

$ pd status
Daemon running (pid 45821) on http://localhost:9876
Uptime: 14h 23m

Services:  5 claimed
Locks:     1 held
Agents:    3 active, 1 stale
Sessions:  2 active
Channels:  4 with subscribers

This tells you at a glance: are things healthy, or is something stuck? Five services running is expected for your project. But if you see 47 services claimed when you only have 3 projects -- something is leaking.

Deep Diagnostics: `pd health`

The health command goes beyond "is the port claimed?" and actually checks whether the service is responding:

$ pd health
myapp:api          :3100  healthy   (200 OK, 12ms)
myapp:web          :3101  healthy   (200 OK, 45ms)
myapp:worker       :3102  UNHEALTHY (connection refused)
dashboard:next     :3200  healthy   (200 OK, 8ms)
blog:gatsby        :3300  UNHEALTHY (timeout after 5000ms)

Now you see the problem immediately: myapp:worker has a claimed port but nothing is listening. And blog:gatsby is responding but too slowly (probably frozen).

Single Service Health Check

$ pd health myapp:worker
myapp:worker  :3102  UNHEALTHY
  Status: connection refused
  Port claimed: 45m ago
  Last healthy: 12m ago
  Suggestion: Process likely crashed. Release with: pd release myapp:worker

Health Check with Custom Paths

If your services have custom health endpoints defined in .portdaddyrc:

# .portdaddyrc health paths are used automatically
$ pd health myapp:api
myapp:api  :3100  healthy
  Endpoint: /health
  Response: {"status":"ok","db":"connected","redis":"connected"}
  Latency: 12ms

The Activity Log: Forensic Debugging

Port Daddy logs every operation. When something goes wrong, the activity log is your flight recorder.

Recent Activity

$ pd log
2026-03-01T02:14:33Z  CLAIM    myapp:worker       port=3102
2026-03-01T02:14:31Z  CLAIM    myapp:web          port=3101
2026-03-01T02:14:30Z  CLAIM    myapp:api          port=3100
2026-03-01T01:58:12Z  RELEASE  dashboard:worker   port=3202
2026-03-01T01:45:00Z  LOCK     db-migrations      owner=agent-db
2026-03-01T01:44:58Z  UNLOCK   db-migrations      owner=agent-db

Filter by Type

# Only show claims and releases
$ pd log --type claim,release

# Only show lock activity
$ pd log --type lock,unlock

# Only show errors
$ pd log --type error

Time Range Queries

# What happened in the last hour?
$ pd log --since 1h

# What happened between midnight and 2am?
$ pd log --since "2026-03-01T00:00:00Z" --until "2026-03-01T02:00:00Z"

# What happened to myapp specifically?
$ pd log --filter myapp:*

Activity Summary

$ pd log --summary
Last 24 hours:
  Claims:    23
  Releases:  18
  Locks:     7
  Unlocks:   7
  Errors:    2

  Most active: myapp:api (14 operations)
  Longest held: blog:gatsby (3d 4h)

Real Scenario: 2am Port Debugging

Let's walk through a realistic debugging session. Your staging deploy just failed.

Step 1: What's the error?

$ npm run dev
Error: listen EADDRINUSE: address already in use :::3100

Step 2: Who has port 3100?

$ pd find :3100
myapp:api  port=3100  claimed 6h ago

You claimed it 6 hours ago and forgot to release it before leaving for dinner.

Step 3: Is it actually running?

$ pd health myapp:api
myapp:api  :3100  UNHEALTHY (connection refused)

The port is claimed but nothing is listening -- a ghost claim. The process crashed but the claim persisted.

Step 4: What happened?

$ pd log --filter myapp:api --since 6h
2026-02-28T20:14:30Z  CLAIM    myapp:api  port=3100
2026-02-28T20:14:35Z  HEALTH   myapp:api  healthy (200 OK)
2026-02-28T22:45:12Z  HEALTH   myapp:api  UNHEALTHY (connection refused)

The API was healthy at 8:14pm, then died around 10:45pm. Probably your laptop went to sleep.

Step 5: Release and reclaim

$ pd release myapp:api
Released myapp:api (port 3100)

$ PORT=$(pd claim myapp:api -q)
$ npm run dev -- --port $PORT
Server running on http://localhost:3100

Total debugging time: 30 seconds. No lsof. No pkill. No collateral damage.

Cleanup: Removing Stale Services

Over time, ghost claims accumulate -- ports claimed by processes that crashed, laptops that went to sleep, or agents that died mid-task. The cleanup command handles this.

Preview What Would Be Cleaned

$ pd cleanup --dry-run
Would release 3 stale services:
  blog:gatsby        port=3300  claimed 3d ago   UNHEALTHY
  dashboard:worker   port=3202  claimed 1d ago   UNHEALTHY
  myapp:worker       port=3102  claimed 45m ago  UNHEALTHY

Run Cleanup

$ pd cleanup
Released 3 stale services:
  blog:gatsby        port=3300
  dashboard:worker   port=3202
  myapp:worker       port=3102

Aggressive Cleanup

# Release everything older than 1 hour that isn't healthy
$ pd cleanup --max-age 1h

# Release ALL services (nuclear option)
$ pd release *
Released 5 services

The --dry-run flag is your friend. Always preview before cleaning.

Distributed Locks: Understanding Lock Contention

When agents coordinate with locks, things can get stuck. Here's how to debug lock issues.

List All Locks

$ pd locks
db-migrations    owner=agent-db      held 2m    expires in 8m
deploy-staging   owner=agent-deploy   held 45s   expires in 4m15s

Inspect a Stuck Lock

$ pd locks
db-migrations    owner=agent-db    held 47m   expires in -37m (EXPIRED)

That lock has been held for 47 minutes and its TTL expired 37 minutes ago. The owning agent probably died without releasing it.

Force-Release a Lock

# Check who owns it first
$ pd locks --json
[{"name":"db-migrations","owner":"agent-db","heldSince":"2026-03-01T01:30:00Z","ttl":600}]

# Force-release it
$ pd unlock db-migrations --force
Lock db-migrations force-released (was held by agent-db)

Preventing Lock Starvation

If an agent keeps renewing a lock and other agents are starved:

# Check lock history in activity log
$ pd log --type lock,unlock --filter db-migrations --since 1h
2026-03-01T01:30:00Z  LOCK     db-migrations  owner=agent-db
2026-03-01T01:35:00Z  EXTEND   db-migrations  owner=agent-db  +10m
2026-03-01T01:40:00Z  EXTEND   db-migrations  owner=agent-db  +10m
2026-03-01T01:45:00Z  EXTEND   db-migrations  owner=agent-db  +10m
...

The agent has been extending its lock every 5 minutes for an hour. It's likely stuck in a loop. Force-release the lock and investigate the agent.

Deep Dive: SQLite Database

For truly deep forensics, you can query Port Daddy's SQLite database directly. This is the nuclear option -- useful when the CLI doesn't expose enough detail.

Finding the Database

# The database lives in your project root
$ sqlite3 port-registry.db

# Or check where it is
$ pd config --json | grep dbPath

Common Forensic Queries

# What tables exist?
sqlite> .tables
services          locks             agents
sessions          session_notes     activity_log
resurrection_queue  webhooks        messages

# All services sorted by age
sqlite> SELECT id, port, createdAt
   FROM services ORDER BY createdAt ASC;
blog:gatsby|3300|2026-02-26T14:00:00Z
dashboard:next|3200|2026-02-28T18:00:00Z
myapp:api|3100|2026-03-01T00:15:00Z

# Services claimed more than 24 hours ago
sqlite> SELECT id, port, createdAt
   FROM services
   WHERE createdAt < datetime('now', '-24 hours');

# Which agent registered most recently?
sqlite> SELECT id, purpose, lastHeartbeat
   FROM agents
   ORDER BY lastHeartbeat DESC LIMIT 5;

# Dead agents (no heartbeat in 20+ minutes)
sqlite> SELECT id, purpose, lastHeartbeat
   FROM agents
   WHERE lastHeartbeat < datetime('now', '-20 minutes');

Correlating Events

# What happened around the time myapp:worker died?
sqlite> SELECT timestamp, type, details
   FROM activity_log
   WHERE timestamp BETWEEN '2026-03-01T01:40:00Z'
     AND '2026-03-01T01:50:00Z'
   ORDER BY timestamp;
2026-03-01T01:42:15Z|HEALTH|myapp:worker UNHEALTHY
2026-03-01T01:42:15Z|HEALTH|myapp:api healthy
2026-03-01T01:42:16Z|HEARTBEAT|agent-worker missed
2026-03-01T01:44:58Z|LOCK|db-migrations acquired by agent-db

Now you can see that the worker went unhealthy at 1:42am, missed its heartbeat, and the database agent acquired a lock shortly after -- possibly triggering a migration that the worker couldn't handle.

Metrics and Performance Debugging

When Port Daddy itself seems slow or unresponsive, check the metrics endpoint:

$ pd metrics
Daemon Metrics
  Uptime:           14h 23m
  Total requests:   1,247
  Avg response:     3.2ms
  Peak response:    145ms
  Active SSE:       2 connections
  DB size:          248 KB
  Memory:           42 MB RSS

Identifying Performance Issues

$ pd metrics --json
{
  "uptime": 51780,
  "requests": 1247,
  "avgResponseMs": 3.2,
  "peakResponseMs": 145,
  "activeSSE": 2,
  "dbSizeBytes": 253952,
  "memoryRSS": 44040192,
  "codeHash": "a1b2c3d4..."
}

Watch for these red flags:

Avg response > 50ms -- SQLite may be locked or the database is too large
Active SSE > 10 -- Too many subscribers; check for leaked connections
DB size > 10 MB -- Activity log may need pruning
Memory > 200 MB -- Possible memory leak in message queues

Checking the Code Hash

# Is the daemon running stale code?
$ pd version
port-daddy v3.3.0
Code hash: a1b2c3d4
Node: v20.11.0

# If you've updated Port Daddy but the hash is old:
$ pd stop && pd start
Daemon restarted with fresh code

Session Tracking for Multi-Agent Debugging

When multiple agents are working simultaneously and something goes wrong, session tracking helps you reconstruct what happened.

List Active Sessions

$ pd sessions
session-a1b2  "Building checkout UI"       active  3 files claimed  12 notes
session-c3d4  "Payment API integration"    active  5 files claimed  8 notes
session-e5f6  "Database migrations"        completed  0 files  4 notes

Read Session Notes for Context

$ pd notes --session session-a1b2
[10:14] Started working on checkout form
[10:22] Installed @stripe/react-stripe-js
[10:35] CheckoutForm component complete
[10:41] Blocked: need payment API types from session-c3d4
[10:55] Unblocked: types available, integrating
[11:02] ERROR: API returning 500 on /api/payments

Now you can see exactly when the frontend agent hit the error, and that it was after the API types became available -- suggesting a bug in the API, not a coordination issue.

Check File Claims for Conflicts

$ pd sessions --files
session-a1b2 claims:
  src/components/checkout/CheckoutForm.tsx
  src/components/checkout/PaymentStatus.tsx
  src/hooks/usePayment.ts

session-c3d4 claims:
  src/api/payments/route.ts
  src/api/payments/webhook.ts
  src/types/Payment.ts
  src/middleware/auth.ts
  src/lib/stripe.ts

No overlapping files -- good. If two sessions claimed the same file, that's where your merge conflict is coming from.

View Dead Agent Context with Salvage

$ pd salvage
Dead agents with recoverable context:

  agent-worker (died 15m ago)
    Purpose: "Processing payment webhooks"
    Session: session-g7h8 (4 notes)
    Last note: "Stripe webhook handler 80% complete"
    Files claimed: src/workers/stripe-webhook.ts

  Run: pd salvage claim agent-worker

Troubleshooting Checklist

Quick reference for the most common problems:

"I can't start my service -- port in use"

# 1. Check who has the port
$ pd find :3100

# 2. Check if it's actually running
$ pd health myapp:api

# 3. If unhealthy, release the ghost claim
$ pd release myapp:api

# 4. Reclaim and start
$ PORT=$(pd claim myapp:api -q) npm run dev -- --port $PORT

"Health check says healthy but my app is broken"

# Health checks hit your healthPath, which may return 200
# even when the app is partially broken

# 1. Check what endpoint is being hit
$ pd health myapp:api --verbose
GET http://localhost:3100/health -> 200 OK (12ms)
Response: {"status":"ok"}

# 2. Your /health endpoint is too simple
# Update .portdaddyrc to check dependencies:
{
  "services": {
    "api": {
      "healthPath": "/health?deep=true"
    }
  }
}

"Random ports keep appearing"

# Something is claiming ports outside your control

# 1. Check activity log for unexpected claims
$ pd log --type claim --since 1h

# 2. Look for patterns
$ pd find *
# If you see services you didn't create, an agent or
# script may be auto-claiming ports

# 3. Check for registered agents
$ pd agents
# Shows all active agents with their purpose

# 4. Clean up and investigate
$ pd cleanup --dry-run

"Lock is stuck and nothing can proceed"

# 1. List all locks
$ pd locks

# 2. Check if the owner is still alive
$ pd agents

# 3. If owner is dead, force-release
$ pd unlock db-migrations --force

# 4. Check lock history for repeat offenders
$ pd log --type lock,unlock --since 2h

"Daemon won't start or is unresponsive"

# 1. Check if it's running
$ pd status

# 2. Check if something else is on port 9876
$ lsof -i :9876

# 3. Kill any stale daemon process
$ pkill -f "port-daddy.*server"

# 4. Restart fresh
$ pd start

# 5. If database is corrupted, check integrity
$ sqlite3 port-registry.db "PRAGMA integrity_check"
ok

"Agents keep dying and entering salvage"

# 1. Check the resurrection queue
$ pd salvage

# 2. Look at their session notes for clues
$ pd notes --session <dead-agent-session>

# 3. Check heartbeat patterns
$ pd log --type heartbeat --filter agent-worker --since 1h

# 4. Common causes:
#    - Agent hit context window limit
#    - Agent crashed on a bad API response
#    - Machine went to sleep
#    - Network partition broke SSE connection

The Debugging Mindset

With Port Daddy, debugging port conflicts follows a consistent pattern:

Identify -- pd find to see what's claimed
Diagnose -- pd health to see what's actually running
Investigate -- pd log to see what happened and when
Resolve -- pd release or pd cleanup to fix ghost claims
Prevent -- pd status and pd metrics to monitor going forward

The days of lsof | grep | awk | xargs kill are over. Port Daddy gives you semantic, timestamped, queryable records of every port operation. At 2am or 2pm, the answer is always one command away.

Previous tutorial

Monorepo Mastery

The Port Is Already In Use: A Horror Story