Commit Graph

24 Commits

Author SHA1 Message Date
gavrielc
44f0b3d99c fix: improve agent output schema, tool descriptions, and shutdown robustness
- Rename status→outputType, responded/silent→message/log for clarity
- Remove scheduled task special-casing: userMessage now sent for all contexts
- Update schema, tool, and CLAUDE.md descriptions to be clear and
  non-contradictory about communication mechanisms
- Use full tool name mcp__nanoclaw__send_message in docs
- Change schedule_task target_group to accept JID instead of folder name
- Only show target_group_jid parameter to main group agents
- Add defense-in-depth sanitization and error callback to exec() in shutdown
- Use "user or group" consistently (supports both 1:1 and group chats)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 20:22:45 +02:00
gavrielc
ae177156ec feat: per-group queue, SQLite state, graceful shutdown (#111)
* fix: wire up queue processMessagesFn before recovery to prevent silent message loss

recoverPendingMessages() was called after startMessageLoop(), which meant:
1. Recovery could race with the message loop's first iteration
2. processMessagesFn was set inside startMessageLoop, so recovery
   enqueues would fire runForGroup with processMessagesFn still null,
   silently skipping message processing

Move setProcessMessagesFn and recoverPendingMessages before startMessageLoop
so the queue is fully wired before any messages are enqueued.

https://claude.ai/code/session_01PCY8zNjDa2N29jvBAV5vfL

* feat: structured agent output to fix infinite retry on silent responses (#113)

Use Agent SDK's outputFormat with json_schema to get typed responses
from the agent. The agent now returns { status: 'responded' | 'silent',
userMessage?, internalLog? } instead of a plain string. This fixes a
critical bug where a null/empty agent response caused infinite 5-second
retry loops by conflating "nothing to say" with "error".

- Agent runner: add AGENT_RESPONSE_SCHEMA and parse structured_output
- Host: advance lastAgentTimestamp on both responded AND silent status
- GroupQueue: add exponential backoff (5s-80s) with max 5 retries for
  actual errors, replacing unbounded fixed-interval retries

https://claude.ai/code/session_014SLc8MxP9BYhEhDCLox9U8

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-06 18:54:26 +02:00
gavrielc
03df69e9b5 fix: address review feedback for per-group queue reliability
- Fix startup recovery running before WhatsApp connects, which could
  permanently lose agent responses by advancing lastAgentTimestamp
  before sock is initialized
- Add 5s retry on container failure so messages aren't silently dropped
  until a new message arrives for the group
- Use `container stop` in shutdown instead of raw SIGTERM to CLI wrapper,
  ensuring proper container cleanup
- Replace unnecessary dynamic imports with static imports in processTaskIpc
- Guard JSON.parse of DB-stored last_agent_timestamp against corruption
- Validate MAX_CONCURRENT_CONTAINERS (default 5, min 1, NaN-safe)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 16:45:00 +02:00
gavrielc
eac9a6acfd feat: per-group queue, SQLite state, graceful shutdown
Add per-group container locking with global concurrency limit to prevent
concurrent containers for the same group (#89) and cap total containers.
Fix message batching bug where lastAgentTimestamp advanced to trigger
message instead of latest in batch, causing redundant re-processing.
Move router state, sessions, and registered groups from JSON files to
SQLite with automatic one-time migration. Add SIGTERM/SIGINT handlers
with graceful shutdown (SIGTERM -> grace period -> SIGKILL). Add startup
recovery for messages missed during crash. Remove dead code: utils.ts,
Session type, isScheduledTask flag, ContainerConfig.env, getTaskRunLogs,
GroupQueue.isActive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 07:38:07 +02:00
gavrielc
db216a459e fix: proper container lifecycle management to prevent stopped container accumulation
- Name containers (nanoclaw-{group}-{timestamp}) for trackability
- Replace SIGKILL timeout with graceful `container stop` so --rm fires
- Add startup sweep to clean up stopped nanoclaw containers from previous runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 07:10:26 +02:00
Ejae-dev
117980175e refactor: deduplicate logger into shared module (#39)
three files created identical pino logger instances with the same config.
extract into src/logger.ts and import from each consumer.

net -9 lines, no behavior change.

Co-authored-by: ejae <ejae_dev@ejaes-Mac-mini.home>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 00:40:58 +02:00
gavrielc
21c66df2b1 Add prettier
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:14:17 +02:00
gavrielc
05a29d562f Security improvements: per-group session isolation, remove built-in Gmail
- Isolate Claude sessions per-group (data/sessions/{group}/.claude/)
  to prevent cross-group access to conversation history
- Remove Gmail MCP from built-in (now available via /add-gmail skill)
- Add SECURITY.md documenting the security model
- Move docs to docs/ folder (SPEC.md, REQUIREMENTS.md, SECURITY.md)
- Update documentation to reflect changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 00:07:59 +02:00
gavrielc
d000f33928 Add container output size limiting to prevent memory issues (#18)
* Fix potential memory DoS via unbounded container output

Add CONTAINER_MAX_OUTPUT_SIZE (default 10MB) to limit accumulated
stdout/stderr from container processes. Without this limit, a malicious
or buggy container could emit huge output leading to host memory
exhaustion.

Changes:
- Add configurable CONTAINER_MAX_OUTPUT_SIZE in config.ts
- Implement size-limited output buffering in runContainerAgent
- Log warnings when truncation occurs
- Include truncation status in container logs

https://claude.ai/code/session_01TjVDwwaGwbcFDdmrFF2y8B

* Update package-lock.json

https://claude.ai/code/session_01TjVDwwaGwbcFDdmrFF2y8B

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-01 23:09:50 +02:00
gavrielc
48822ff67d Add mount security allowlist for external directory access (#14)
* Add secure mount allowlist validation

Addresses arbitrary host mount vulnerability by validating additional
mounts against an external allowlist stored at ~/.config/nanoclaw/.
This location is never mounted into containers, making it tamper-proof.

Security measures:
- Allowlist cached in memory (edits require process restart)
- Real path resolution (blocks symlink and .. traversal attacks)
- Blocked patterns for sensitive paths (.ssh, .gnupg, .aws, etc.)
- Non-main groups forced to read-only when nonMainReadOnly is true
- Container path validation prevents /workspace/extra escape

https://claude.ai/code/session_01BPqdNy4EAHHJcdtZ27TXkh

* Add mount allowlist setup to /setup skill

Interactive walkthrough that asks users:
- Whether they want agents to access external directories
- Which directories to allow (with paths)
- Read-write vs read-only for each
- Whether non-main groups should be restricted to read-only

Creates ~/.config/nanoclaw/mount-allowlist.json based on answers.

https://claude.ai/code/session_01BPqdNy4EAHHJcdtZ27TXkh

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-01 22:55:08 +02:00
gavrielc
016a1a0e31 Add group metadata sync for easier group activation
- Sync group names from WhatsApp via groupFetchAllParticipating()
- Store group names in chats table (jid -> name mapping)
- Daily sync with 24h cache, on-demand refresh via IPC
- Write available_groups.json snapshot for agent (main group only)
- Agent can request refresh_groups via IPC if group not found
- Update documentation in main CLAUDE.md and debug skill

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 22:25:29 +02:00
gavrielc
6745a1c54b Apply fixes from closed PRs: sentinel markers, JID lookup, schedule validation
- PR #10: Add sentinel markers for robust JSON parsing between container
  and host. Fallback to last-line parsing for backwards compatibility.

- PR #5: Look up target JID from registeredGroups instead of trusting
  IPC payload, fixing cross-group scheduled tasks getting wrong chat_jid.

- PR #8: Add lightweight schedule validation in container MCP that
  returns errors to agents (cron syntax, positive interval, valid ISO
  timestamp). Also defensive validation on host side.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:49:57 +02:00
gavrielc
ade9f2d323 Merge pull request #3 from gavrielc/claude/secure-ipc-access-Ni9l4
Secure IPC with per-group namespaces to prevent privilege escalation
2026-02-01 20:40:27 +02:00
gavrielc
069bc76016 Merge pull request #7 from gavrielc/claude/fix-home-directory-fallback-FF5Tr
Fix hardcoded home directory fallback in container runner
2026-02-01 20:40:02 +02:00
Claude
a8155e2bbc Fix hardcoded home directory fallback in container runner
Replace environment-specific fallback '/Users/gavriel' with os.homedir()
and proper error handling. The new getHomeDir() helper function:
- First checks process.env.HOME
- Falls back to os.homedir() for cross-platform support
- Throws a clear error if home directory cannot be determined

https://claude.ai/code/session_011Cs2FWxXMvAdAh4w9A6AZC
2026-02-01 17:56:15 +00:00
Claude
6a94aec5da Secure IPC with per-group namespaces to prevent privilege escalation
Each container now gets its own IPC directory (/data/ipc/{groupFolder}/)
instead of a shared global directory. Identity is determined by which
directory a request came from, not by self-reported data in IPC files.

Authorization enforced:
- send_message: only to chatJids belonging to the source group
- schedule_task: only for the source group (main can target any)
- pause/resume/cancel_task: only for tasks owned by source group

https://claude.ai/code/session_018nmxNEbtgJH7cKDyBSQGAw
2026-02-01 17:44:25 +00:00
Claude
49e7875e67 Fix security: only expose auth vars to containers, not full .env
Previously, the entire .env file was copied and mounted into containers,
exposing all environment variables to the agent. Now only the specific
authentication variables needed by Claude Code (CLAUDE_CODE_OAUTH_TOKEN
and ANTHROPIC_API_KEY) are extracted and mounted.

https://claude.ai/code/session_01Y6Az5oUPkYmJhA1N9MUd67
2026-02-01 17:42:29 +00:00
Gavriel
2dedd18491 Fix scheduled tasks and improve task scheduling UX
- Fix Apple Container mount issue: move groups/CLAUDE.md to groups/global/
  directory (Apple Container only supports directory mounts, not file mounts)
- Fix scheduled tasks for main group: properly detect isMain based on
  group_folder instead of always setting false
- Add isScheduledTask flag so agent knows when running as scheduled task
- Improve schedule_task tool description with clear format examples for
  cron, interval, and once schedule types
- Update global CLAUDE.md with instructions for scheduled tasks to use
  mcp__nanoclaw__send_message when needed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 17:24:12 +02:00
gavrielc
f25e0f9a10 Remove redundant comments throughout codebase
Keep only comments that explain non-obvious behavior or add context
not apparent from reading the code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 16:00:44 +02:00
gavrielc
732c624e6b Fix security issues: IPC auth, message logging, container logs
- Add authorization checks to IPC task operations (pause/resume/cancel)
  to prevent cross-group task manipulation
- Only store message content for registered groups; unregistered chats
  only get metadata stored for group discovery
- Container logs now only include full input/output in debug mode;
  default logging omits sensitive message content

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 15:51:53 +02:00
Gavriel
8ca4c95517 Fix session persistence and auto-start container system
- Fix session mount path: ~/.claude/ now mounts to /home/node/.claude/
  (container runs as 'node' user with HOME=/home/node, not root)
- Fix ~/.gmail-mcp/ mount path similarly
- Use absolute paths for GROUPS_DIR and DATA_DIR (required for container mounts)
- Auto-start Apple Container system on NanoClaw startup
- Update debug skill with session troubleshooting guide
- Update spec.md with startup sequence and troubleshooting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 11:31:52 +02:00
Gavriel
67e0295d82 Fix container execution and add debug tooling
Container fixes:
- Run as non-root 'node' user (required for --dangerously-skip-permissions)
- Add allowDangerouslySkipPermissions: true to SDK options
- Mount .env file to work around Apple Container -i env var bug
- Use --mount for readonly, -v for read-write (Apple Container quirk)
- Bump SDK to 0.2.29, zod to v4
- Install Claude Code CLI globally in container

Logging improvements:
- Write per-run logs to groups/{folder}/logs/container-*.log
- Add debug-level logging for mounts and container args

Documentation:
- Add /debug skill with comprehensive troubleshooting guide
- Update /setup skill with API key configuration step
- Update SPEC.md with container details, mount syntax, security notes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 10:35:08 +02:00
gavrielc
0ccdaaac48 Mount project root for main channel
- Main gets /workspace/project with full project access
- Main can query SQLite database and edit configs
- Updated main CLAUDE.md with container paths
- Added docs for configuring additional mounts per group

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 23:01:45 +02:00
gavrielc
09c0e8142e Add containerized agent execution with Apple Container
- Agents run in isolated Linux VMs via Apple Container
- All groups get Bash access (safe - sandboxed in container)
- Browser automation via agent-browser + Chromium
- Per-group configurable additional directory mounts
- File-based IPC for messages and scheduled tasks
- Container image with Node.js 22, Chromium, agent-browser

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 22:55:57 +02:00