It's 2am. You're deploying a hotfix. The staging server won't start. Your terminal screams at you in red:
$ npm run dev
Error: listen EADDRINUSE: address already in use :::3100
at Server.setupListenHandle [as _listen2] (net.js:1318:16)
at listenInCluster (net.js:1366:12)
The old you reaches for lsof:
$ lsof -i :3100
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
node 48291 erich 23u IPv6 0x1a2b3c... 0t0 TCP *:3100 (LISTEN)
# What is PID 48291? When did it start? Which project?
# Is it safe to kill? Is anyone depending on it?
# You have no idea.
$ pkill -f node
# You just killed 7 unrelated Node processes.
# Your VS Code extensions are crashing.
# Your other terminal windows are blank.
# Your colleague's API server that was proxying through your machine is down.
There is a better way.
The 2am Nightmare (In Detail)
The EADDRINUSE error is just the symptom. The real problems are:
- You don't know what's listening --
lsofshows PIDs, not purpose - You don't know why it's listening -- Is it a forgotten dev server? A stale process? A different project?
- You don't know when it started -- Was it 5 minutes ago or 5 days ago?
- You don't know who started it -- Was it you? An agent? A CI runner?
- You can't safely kill it -- Without knowing the above,
killis a dice roll
Port Daddy answers all five questions instantly.
The Port Daddy Solution
When every service claims its port through Port Daddy, you get a complete registry of what's running, why, and when it started. Instead of lsof forensics, you ask Port Daddy:
$ pd find :3100
myapp:api port=3100 claimed 2h ago healthy
One line. You know the project (myapp), the stack (api), how long it's been running, and whether it's healthy. No PIDs, no guessing, no collateral damage.
Let's walk through every debugging tool Port Daddy gives you.
Understanding What's Listening: pd find
The find command is your first stop. It searches the service registry by name pattern.
List All Services
$ pd find *
myapp:api port=3100 claimed 2h ago
myapp:web port=3101 claimed 2h ago
myapp:worker port=3102 claimed 45m ago
dashboard:next port=3200 claimed 6h ago
blog:gatsby port=3300 claimed 3d ago
Pattern Matching
# Find all services for a specific project
$ pd find myapp:*
myapp:api port=3100 claimed 2h ago
myapp:web port=3101 claimed 2h ago
myapp:worker port=3102 claimed 45m ago
# Find all API services across projects
$ pd find *:api
myapp:api port=3100 claimed 2h ago
dashboard:api port=3201 claimed 1d ago
# Find a specific port
$ pd find :3100
myapp:api port=3100 claimed 2h ago
JSON Output for Scripting
$ pd find myapp:* --json
[
{"id":"myapp:api","port":3100,"claimedAt":"2026-03-01T00:15:00Z"},
{"id":"myapp:web","port":3101,"claimedAt":"2026-03-01T00:15:02Z"},
{"id":"myapp:worker","port":3102,"claimedAt":"2026-03-01T01:30:00Z"}
]
Full System Status: pd status
For a bird's eye view of everything Port Daddy is managing:
$ pd status
Daemon running (pid 45821) on http://localhost:9876
Uptime: 14h 23m
Services: 5 claimed
Locks: 1 held
Agents: 3 active, 1 stale
Sessions: 2 active
Channels: 4 with subscribers
This tells you at a glance: are things healthy, or is something stuck? Five services running is expected for your project. But if you see 47 services claimed when you only have 3 projects -- something is leaking.
Deep Diagnostics: pd health
The health command goes beyond "is the port claimed?" and actually checks whether the service is responding:
$ pd health
myapp:api :3100 healthy (200 OK, 12ms)
myapp:web :3101 healthy (200 OK, 45ms)
myapp:worker :3102 UNHEALTHY (connection refused)
dashboard:next :3200 healthy (200 OK, 8ms)
blog:gatsby :3300 UNHEALTHY (timeout after 5000ms)
Now you see the problem immediately: myapp:worker has a claimed port but nothing is listening. And blog:gatsby is responding but too slowly (probably frozen).
Single Service Health Check
$ pd health myapp:worker
myapp:worker :3102 UNHEALTHY
Status: connection refused
Port claimed: 45m ago
Last healthy: 12m ago
Suggestion: Process likely crashed. Release with: pd release myapp:worker
Health Check with Custom Paths
If your services have custom health endpoints defined in .portdaddyrc:
# .portdaddyrc health paths are used automatically
$ pd health myapp:api
myapp:api :3100 healthy
Endpoint: /health
Response: {"status":"ok","db":"connected","redis":"connected"}
Latency: 12ms
The Activity Log: Forensic Debugging
Port Daddy logs every operation. When something goes wrong, the activity log is your flight recorder.
Recent Activity
$ pd log
2026-03-01T02:14:33Z CLAIM myapp:worker port=3102
2026-03-01T02:14:31Z CLAIM myapp:web port=3101
2026-03-01T02:14:30Z CLAIM myapp:api port=3100
2026-03-01T01:58:12Z RELEASE dashboard:worker port=3202
2026-03-01T01:45:00Z LOCK db-migrations owner=agent-db
2026-03-01T01:44:58Z UNLOCK db-migrations owner=agent-db
Filter by Type
# Only show claims and releases
$ pd log --type claim,release
# Only show lock activity
$ pd log --type lock,unlock
# Only show errors
$ pd log --type error
Time Range Queries
# What happened in the last hour?
$ pd log --since 1h
# What happened between midnight and 2am?
$ pd log --since "2026-03-01T00:00:00Z" --until "2026-03-01T02:00:00Z"
# What happened to myapp specifically?
$ pd log --filter myapp:*
Activity Summary
$ pd log --summary
Last 24 hours:
Claims: 23
Releases: 18
Locks: 7
Unlocks: 7
Errors: 2
Most active: myapp:api (14 operations)
Longest held: blog:gatsby (3d 4h)
Real Scenario: 2am Port Debugging
Let's walk through a realistic debugging session. Your staging deploy just failed.
Step 1: What's the error?
$ npm run dev
Error: listen EADDRINUSE: address already in use :::3100
Step 2: Who has port 3100?
$ pd find :3100
myapp:api port=3100 claimed 6h ago
You claimed it 6 hours ago and forgot to release it before leaving for dinner.
Step 3: Is it actually running?
$ pd health myapp:api
myapp:api :3100 UNHEALTHY (connection refused)
The port is claimed but nothing is listening -- a ghost claim. The process crashed but the claim persisted.
Step 4: What happened?
$ pd log --filter myapp:api --since 6h
2026-02-28T20:14:30Z CLAIM myapp:api port=3100
2026-02-28T20:14:35Z HEALTH myapp:api healthy (200 OK)
2026-02-28T22:45:12Z HEALTH myapp:api UNHEALTHY (connection refused)
The API was healthy at 8:14pm, then died around 10:45pm. Probably your laptop went to sleep.
Step 5: Release and reclaim
$ pd release myapp:api
Released myapp:api (port 3100)
$ PORT=$(pd claim myapp:api -q)
$ npm run dev -- --port $PORT
Server running on http://localhost:3100
Total debugging time: 30 seconds. No lsof. No pkill. No collateral damage.
Cleanup: Removing Stale Services
Over time, ghost claims accumulate -- ports claimed by processes that crashed, laptops that went to sleep, or agents that died mid-task. The cleanup command handles this.
Preview What Would Be Cleaned
$ pd cleanup --dry-run
Would release 3 stale services:
blog:gatsby port=3300 claimed 3d ago UNHEALTHY
dashboard:worker port=3202 claimed 1d ago UNHEALTHY
myapp:worker port=3102 claimed 45m ago UNHEALTHY
Run Cleanup
$ pd cleanup
Released 3 stale services:
blog:gatsby port=3300
dashboard:worker port=3202
myapp:worker port=3102
Aggressive Cleanup
# Release everything older than 1 hour that isn't healthy
$ pd cleanup --max-age 1h
# Release ALL services (nuclear option)
$ pd release *
Released 5 services
The --dry-run flag is your friend. Always preview before cleaning.
Distributed Locks: Understanding Lock Contention
When agents coordinate with locks, things can get stuck. Here's how to debug lock issues.
List All Locks
$ pd locks
db-migrations owner=agent-db held 2m expires in 8m
deploy-staging owner=agent-deploy held 45s expires in 4m15s
Inspect a Stuck Lock
$ pd locks
db-migrations owner=agent-db held 47m expires in -37m (EXPIRED)
That lock has been held for 47 minutes and its TTL expired 37 minutes ago. The owning agent probably died without releasing it.
Force-Release a Lock
# Check who owns it first
$ pd locks --json
[{"name":"db-migrations","owner":"agent-db","heldSince":"2026-03-01T01:30:00Z","ttl":600}]
# Force-release it
$ pd unlock db-migrations --force
Lock db-migrations force-released (was held by agent-db)
Preventing Lock Starvation
If an agent keeps renewing a lock and other agents are starved:
# Check lock history in activity log
$ pd log --type lock,unlock --filter db-migrations --since 1h
2026-03-01T01:30:00Z LOCK db-migrations owner=agent-db
2026-03-01T01:35:00Z EXTEND db-migrations owner=agent-db +10m
2026-03-01T01:40:00Z EXTEND db-migrations owner=agent-db +10m
2026-03-01T01:45:00Z EXTEND db-migrations owner=agent-db +10m
...
The agent has been extending its lock every 5 minutes for an hour. It's likely stuck in a loop. Force-release the lock and investigate the agent.
Deep Dive: SQLite Database
For truly deep forensics, you can query Port Daddy's SQLite database directly. This is the nuclear option -- useful when the CLI doesn't expose enough detail.
Finding the Database
# The database lives in your project root
$ sqlite3 port-registry.db
# Or check where it is
$ pd config --json | grep dbPath
Common Forensic Queries
# What tables exist?
sqlite> .tables
services locks agents
sessions session_notes activity_log
resurrection_queue webhooks messages
# All services sorted by age
sqlite> SELECT id, port, createdAt
FROM services ORDER BY createdAt ASC;
blog:gatsby|3300|2026-02-26T14:00:00Z
dashboard:next|3200|2026-02-28T18:00:00Z
myapp:api|3100|2026-03-01T00:15:00Z
# Services claimed more than 24 hours ago
sqlite> SELECT id, port, createdAt
FROM services
WHERE createdAt < datetime('now', '-24 hours');
# Which agent registered most recently?
sqlite> SELECT id, purpose, lastHeartbeat
FROM agents
ORDER BY lastHeartbeat DESC LIMIT 5;
# Dead agents (no heartbeat in 20+ minutes)
sqlite> SELECT id, purpose, lastHeartbeat
FROM agents
WHERE lastHeartbeat < datetime('now', '-20 minutes');
Correlating Events
# What happened around the time myapp:worker died?
sqlite> SELECT timestamp, type, details
FROM activity_log
WHERE timestamp BETWEEN '2026-03-01T01:40:00Z'
AND '2026-03-01T01:50:00Z'
ORDER BY timestamp;
2026-03-01T01:42:15Z|HEALTH|myapp:worker UNHEALTHY
2026-03-01T01:42:15Z|HEALTH|myapp:api healthy
2026-03-01T01:42:16Z|HEARTBEAT|agent-worker missed
2026-03-01T01:44:58Z|LOCK|db-migrations acquired by agent-db
Now you can see that the worker went unhealthy at 1:42am, missed its heartbeat, and the database agent acquired a lock shortly after -- possibly triggering a migration that the worker couldn't handle.
Metrics and Performance Debugging
When Port Daddy itself seems slow or unresponsive, check the metrics endpoint:
$ pd metrics
Daemon Metrics
Uptime: 14h 23m
Total requests: 1,247
Avg response: 3.2ms
Peak response: 145ms
Active SSE: 2 connections
DB size: 248 KB
Memory: 42 MB RSS
Identifying Performance Issues
$ pd metrics --json
{
"uptime": 51780,
"requests": 1247,
"avgResponseMs": 3.2,
"peakResponseMs": 145,
"activeSSE": 2,
"dbSizeBytes": 253952,
"memoryRSS": 44040192,
"codeHash": "a1b2c3d4..."
}
Watch for these red flags:
- Avg response > 50ms -- SQLite may be locked or the database is too large
- Active SSE > 10 -- Too many subscribers; check for leaked connections
- DB size > 10 MB -- Activity log may need pruning
- Memory > 200 MB -- Possible memory leak in message queues
Checking the Code Hash
# Is the daemon running stale code?
$ pd version
port-daddy v3.3.0
Code hash: a1b2c3d4
Node: v20.11.0
# If you've updated Port Daddy but the hash is old:
$ pd stop && pd start
Daemon restarted with fresh code
Session Tracking for Multi-Agent Debugging
When multiple agents are working simultaneously and something goes wrong, session tracking helps you reconstruct what happened.
List Active Sessions
$ pd sessions
session-a1b2 "Building checkout UI" active 3 files claimed 12 notes
session-c3d4 "Payment API integration" active 5 files claimed 8 notes
session-e5f6 "Database migrations" completed 0 files 4 notes
Read Session Notes for Context
$ pd notes --session session-a1b2
[10:14] Started working on checkout form
[10:22] Installed @stripe/react-stripe-js
[10:35] CheckoutForm component complete
[10:41] Blocked: need payment API types from session-c3d4
[10:55] Unblocked: types available, integrating
[11:02] ERROR: API returning 500 on /api/payments
Now you can see exactly when the frontend agent hit the error, and that it was after the API types became available -- suggesting a bug in the API, not a coordination issue.
Check File Claims for Conflicts
$ pd sessions --files
session-a1b2 claims:
src/components/checkout/CheckoutForm.tsx
src/components/checkout/PaymentStatus.tsx
src/hooks/usePayment.ts
session-c3d4 claims:
src/api/payments/route.ts
src/api/payments/webhook.ts
src/types/Payment.ts
src/middleware/auth.ts
src/lib/stripe.ts
No overlapping files -- good. If two sessions claimed the same file, that's where your merge conflict is coming from.
View Dead Agent Context with Salvage
$ pd salvage
Dead agents with recoverable context:
agent-worker (died 15m ago)
Purpose: "Processing payment webhooks"
Session: session-g7h8 (4 notes)
Last note: "Stripe webhook handler 80% complete"
Files claimed: src/workers/stripe-webhook.ts
Run: pd salvage claim agent-worker
Troubleshooting Checklist
Quick reference for the most common problems:
"I can't start my service -- port in use"
# 1. Check who has the port
$ pd find :3100
# 2. Check if it's actually running
$ pd health myapp:api
# 3. If unhealthy, release the ghost claim
$ pd release myapp:api
# 4. Reclaim and start
$ PORT=$(pd claim myapp:api -q) npm run dev -- --port $PORT
"Health check says healthy but my app is broken"
# Health checks hit your healthPath, which may return 200
# even when the app is partially broken
# 1. Check what endpoint is being hit
$ pd health myapp:api --verbose
GET http://localhost:3100/health -> 200 OK (12ms)
Response: {"status":"ok"}
# 2. Your /health endpoint is too simple
# Update .portdaddyrc to check dependencies:
{
"services": {
"api": {
"healthPath": "/health?deep=true"
}
}
}
"Random ports keep appearing"
# Something is claiming ports outside your control
# 1. Check activity log for unexpected claims
$ pd log --type claim --since 1h
# 2. Look for patterns
$ pd find *
# If you see services you didn't create, an agent or
# script may be auto-claiming ports
# 3. Check for registered agents
$ pd agents
# Shows all active agents with their purpose
# 4. Clean up and investigate
$ pd cleanup --dry-run
"Lock is stuck and nothing can proceed"
# 1. List all locks
$ pd locks
# 2. Check if the owner is still alive
$ pd agents
# 3. If owner is dead, force-release
$ pd unlock db-migrations --force
# 4. Check lock history for repeat offenders
$ pd log --type lock,unlock --since 2h
"Daemon won't start or is unresponsive"
# 1. Check if it's running
$ pd status
# 2. Check if something else is on port 9876
$ lsof -i :9876
# 3. Kill any stale daemon process
$ pkill -f "port-daddy.*server"
# 4. Restart fresh
$ pd start
# 5. If database is corrupted, check integrity
$ sqlite3 port-registry.db "PRAGMA integrity_check"
ok
"Agents keep dying and entering salvage"
# 1. Check the resurrection queue
$ pd salvage
# 2. Look at their session notes for clues
$ pd notes --session <dead-agent-session>
# 3. Check heartbeat patterns
$ pd log --type heartbeat --filter agent-worker --since 1h
# 4. Common causes:
# - Agent hit context window limit
# - Agent crashed on a bad API response
# - Machine went to sleep
# - Network partition broke SSE connection
The Debugging Mindset
With Port Daddy, debugging port conflicts follows a consistent pattern:
- Identify --
pd findto see what's claimed - Diagnose --
pd healthto see what's actually running - Investigate --
pd logto see what happened and when - Resolve --
pd releaseorpd cleanupto fix ghost claims - Prevent --
pd statusandpd metricsto monitor going forward
The days of lsof | grep | awk | xargs kill are over. Port Daddy gives you semantic, timestamped, queryable records of every port operation. At 2am or 2pm, the answer is always one command away.