Hermes Agent

Everything I can do for you. June 2026.

Your Mac — Background Control

I drive macOS without stealing your cursor or keyboard. Works on any window — visible, hidden, or on another Space.

CapabilityDetails
Click, type, scrollAny app. By element index or keyboard shortcut. Background mode — never interrupts you.
See what's on screenScreenshots + accessibility tree. Every button, text field, menu item, label.
Open and close appsSafari, Notes, Telegram, Finder, Terminal — anything. Close windows when done.
Fill forms, navigate menusClick by element index, type text, press Cmd+key combos. All in background.
Type long textClipboard paste via Cmd+V. Never keystrokes character by character.
Scroll and dragScroll wheels, drag between elements, precise pixel-level control.

Cannot type passwords, click payment buttons, or approve system permission dialogs without you.

Web & Browser

CapabilityDetails
Browse any websiteNavigate, scroll, click, fill forms. Full browser engine.
Read page contentText snapshots of everything visible — headings, paragraphs, buttons, links.
Run JavaScript in pagesInspect DOM, extract data, check state, read console output.
Check browser consoleSee JS errors, failed API calls, console.log output.
Visual screenshotsAnnotated overlays with numbered elements for precise clicking.
Navigate back, refreshStandard browser controls.

Cannot bypass CAPTCHAs. Reddit, Google, and DuckDuckGo block automated browsers. Confirmed pattern — I pivot to alternative methods.

Files & Terminal

CapabilityDetails
Read any text fileWith line numbers and pagination. Large files handled efficiently.
Write and create filesOverwrite or create new. Auto-creates parent directories.
Edit files preciselyFind-and-replace with fuzzy matching. Targeted patches. Syntax checks after edit.
Search filesRipgrep-backed. Search by content or file name. Regex support.
Run terminal commandsFull shell access — npm, git, builds, deploys, scripts, package installs.
Execute Python scriptsLive Python with access to my tools API. Process data, loop, retry.
Background processesServers, watchers, long-running tasks. Completion notifications.
Deploy to Cloudflare PagesWrangler CLI — create project, deploy, verify, custom domains.

Communication & Media

CapabilityDetails
Rich MarkdownBold, italic, strikethrough, spoilers, inline code, code blocks, headers, links.
TablesReal Markdown tables with pipe syntax. Degrade gracefully on plain-text clients.
Task listsCheckable lists with - [ ] and - [x] syntax.
Voice memosText → speech as native Telegram voice bubbles. Edge, OpenAI, ElevenLabs.
Image generationFAL.ai FLUX model. Text-to-image and image-to-image editing.
Idea cardsBeautiful rendered card images — headings + body. Themes, gradients, glass styles.
Send filesImages as photos, .ogg as voice, .mp4 as video. Any file as Telegram attachment.
Math and formulasInline $...$ and block $$...$$ LaTeX rendering.

Research & Knowledge

CapabilityDetails
Web searchMultiple sources. Current and historical information.
Deep researchMulti-source synthesis across web, academic databases, and code repositories.
Persistent memoryPermanent storage across all sessions. Preferences, facts, lessons, conventions.
Session historySearch and read back through all past conversations. FTS5-backed.
200+ specialized skillsMarketing, design, coding, automation, video editing, SEO, finance, healthcare.
Skill creationSave successful workflows as reusable skills for future tasks.

Council Mode

CapabilityDetails
Spawn sub-agents3-4 adversarial voices: Skeptic, Pragmatist, Critic. Fresh context each round.
Multi-round refinement5+ rounds until convergence. Each round shreds assumptions deeper.
Parallel executionAll voices run simultaneously. Results synthesized into verdict.
Anti-anchoringSub-agents get only the question + file. No conversation history bias.
Verdict formatConsensus, strongest dissent, premise check, recommendation.

Scheduling & Automation

CapabilityDetails
Cron jobsRecurring tasks — every 30m, hourly, daily, weekly. Custom cron expressions.
Autonomous executionRuns without you present. Fresh session per tick.
Script-based jobsShell and Python scripts on schedule. Can skip LLM entirely for data collection.
Notification deliveryResults to Telegram, local files, or back to this chat.
Job chainingOne job's output becomes another's input. Build pipelines.
Watchdog patternScript runs, checks condition, stays silent when nothing to report.

Apple Ecosystem

CapabilityDetails
Apple NotesCreate, search, edit notes via memo CLI.
Apple RemindersAdd, list, complete reminders via remindctl CLI.
Find MyTrack Apple devices and AirTags via Find My app.
iMessageSend and receive iMessages and SMS via imsg CLI.

Apps & Services

CapabilityDetails
EmailSend, receive, search, manage via Himalaya CLI. Full mailbox operations.
Google WorkspaceGmail, Calendar, Drive, Docs, Sheets, Slides — full operations.
NotionPages, databases, markdown import/export, Workers integration.
AirtableRecords CRUD, filters, upserts via REST API.
LinearIssues, projects, teams via GraphQL.
SpotifyPlay, search, queue, manage playlists and devices.
Philips HueControl lights, rooms, scenes, colors via OpenHue CLI.

Documents & Media

CapabilityDetails
PresentationsCreate, read, edit .pptx decks with slides, notes, templates.
Word documentsCreate, read, edit .docx files.
SpreadsheetsCreate, read, edit .xlsx files with formulas and formatting.
PDF processingCreate, edit, OCR, extract text from PDFs and scans.
YouTube transcriptsExtract and convert to summaries, threads, blog posts.
Music generationHeartMuLa — Suno-style song generation from lyrics and tags.
Manim videos3Blue1Brown-style math and algorithm animations.
ASCII artpyfiglet, cowsay, image-to-ASCII conversion.

Development & Creative

CapabilityDetails
Website buildingHTML/CSS/JS, Vite + React, ReactBits components, Cloudflare deploy.
iOS appsBuild, compile, run. Icon generation for Xcode asset catalogs.
Game serversHost modded Minecraft servers (CurseForge, Modrinth).
PokemonPlay via headless emulator with RAM reads.
MapsGeocoding, POIs, routes, timezones via OpenStreetMap/OSRM.
SEOTechnical audits, on-page optimization, keyword research.
School projectsAmateur-level HTML websites for GYBY assignments.
Red teamingLLM jailbreak techniques — academic and security research only.

What I Cannot Do

LimitationWhy
Delete or edit Telegram messagesClient-side action requiring bot API admin permissions.
Pin Telegram messagesNot available in personal DM chat context.
Bypass CAPTCHAsBrowser automation detection. Architectural limitation.
Access passwords or credit cardsSafety rule. Never. Regardless of context.
Click payment or permission dialogsMust ask you first. Safety guard.
Run on your phoneMac only. Desktop-bound for now.
Send messages as you on social platformsI assist with content — you post it.