Files
dotfiles/pi/files/agent/extensions/pi-web-access/CHANGELOG.md
2026-02-19 22:23:48 +00:00

18 KiB

Pi Web Access - Changelog

All notable changes to this project will be documented in this file.

[0.10.2] - 2026-02-18

Added

  • Interactive search curation. Press Ctrl+Shift+S during or after a multi-query search to open a browser-based review UI. Results stream in live via SSE. Pick which queries to keep, add new searches on the fly, switch providers — then submit to send only the curated results to the agent.
  • Auto-condense pipeline. When the countdown expires without manual curation, a single LLM call (Claude Haiku by default) condenses all search results into a deduplicated briefing organized by topic. Preprocessing enriches the prompt with URL overlap, answer similarity, and source quality analysis. Configure via "autoFilter" in ~/.pi/web-search.json. Full uncondensed results stored and retrievable via get_search_content.
  • Configurable keyboard shortcuts. Both shortcuts (curate: Ctrl+Shift+S, activity monitor: Ctrl+Shift+W) can be remapped via "shortcuts" in ~/.pi/web-search.json. Changes take effect on restart.
  • /websearch command — opens the curator directly from pi without an agent round-trip. Accepts optional comma-separated queries or opens empty.
  • Task-aware condensation. Optional context parameter on web_search — a brief description of the user's task. The condenser uses it to focus the briefing on what matters.
  • Provider selection — global dropdown in the curator UI to switch between Perplexity and Gemini. Persists to ~/.pi/web-search.json.
  • Live condense status in countdown. Shows "condensing..." while the LLM is working, then "N searches condensed" once complete.
  • Markdown rendering in curator result cards via marked.js.
  • Query-level result cards with expandable answers and source lists. Check/uncheck to include or exclude.
  • SSE streaming with keepalive, socket health checks, and buffered delivery.
  • Idle-based timer (60s default, adjustable). Timeout sends all results as safe default.
  • Keyboard shortcuts: Enter (submit), Escape (skip), A (toggle all).
  • Dark/light theme via prefers-color-scheme with teal accent palette.

Changed

  • Curate enabled by default. Multi-query searches show a 10-second review window; single queries send immediately. Pass curate: false to opt out.
  • Curate shortcut opens browser immediately, even mid-search. Remaining results stream in live via SSE.
  • Tool descriptions encourage multi-query research. The queries param explains how to vary phrasing and scope across 2-4 queries, with good/bad examples.
  • Curated results instruct the LLM. Tool output prefixed with an instruction telling the LLM to use curated results as-is.
  • Expanded view shows full answer text per query with source titles and domains.
  • Non-curated web_search calls now respect the saved provider preference.
  • Config helpers generalized from loadSavedProvider/saveProvider to loadConfig/saveConfig.

Fixed

  • Curated onSubmit passed the original full query list instead of the filtered list, inflating queryCount.
  • Collapsed curated status mixed source URL counts with query counts.

New files

  • curator-server.ts — ephemeral HTTP server with SSE streaming, state machine, heartbeat watchdog, and token auth.
  • curator-page.ts — HTML/CSS/JS for the curator UI with markdown rendering and overlay transitions.
  • search-filter.ts — auto-condense pipeline: preprocessing, LLM condensation via pi's model registry, and post-processing (citation verification, source list completion).

[0.7.3] - 2026-02-05

Added

  • Jina Reader fallback for JS-rendered pages. When Readability returns insufficient content (cookie notices, consent walls, SPA shells), the extraction chain now tries Jina Reader (r.jina.ai) before falling back to Gemini. Jina handles JavaScript rendering server-side and returns clean markdown. No API key required.
  • JS-render detection heuristic (isLikelyJSRendered) produces more specific error messages when pages appear to load content dynamically.
  • Actionable guidance when all extraction methods fail, listing steps to configure Gemini API or use web_search instead.

Changed

  • HTTP fetch headers now mimic Chrome (realistic User-Agent, Sec-Fetch-*, Accept-Language) instead of the default Node.js user agent. Reduces blocks from bot-detection systems.
  • Short Readability output (< 500 chars) is now treated as a content failure, triggering the fallback chain. Previously, a 266-char cookie notice was returned as "successful" content.
  • Extraction fallback order is now: HTTP+Readability → RSC → Jina Reader → Gemini URL Context → Gemini Web → error with guidance.

Fixed

  • parseTimestamp now rejects negative values in colon-separated format (-1:30, 1:-30). Previously only the numeric path (-90) rejected negatives, while the colon path computed and returned negative seconds.

[0.7.2] - 2026-02-03

Added

  • model parameter on fetch_content to override the Gemini model per-request (e.g. model: "gemini-2.5-flash")
  • Collapsed TUI results now show a 200-char text preview instead of just the status line
  • LICENSE file (MIT)

Changed

  • Default Gemini model updated from gemini-2.5-flash to gemini-3-flash-preview across all API, search, URL context, YouTube, and video paths. Gemini Web gracefully falls back to gemini-2.5-flash when the model header isn't available.
  • README rewritten: added tagline, badges, "Why" section, Quick Start, corrected "How It Works" routing order, fixed inaccurate env var precedence claim, added missing /v/ YouTube format, restored /search command docs, collapsible Files table

Fixed

  • PERPLEXITY_API_KEY env var now takes precedence over config file value, matching GEMINI_API_KEY behavior and README documentation (was reversed)
  • package.json now includes repository, homepage, bugs, and description fields (repo link was missing from pi packages site)

[0.7.0] - 2026-02-03

Added

  • Multi-provider web search: web_search now supports Perplexity, Gemini API (with Google Search grounding), and Gemini Web (cookie auth) as search providers. New provider parameter (auto, perplexity, gemini) controls selection. In auto mode (default): Perplexity → Gemini API → Gemini Web. Backwards-compatible — existing Perplexity users see no change.
  • Gemini API grounded search: Structured citations via groundingMetadata with source URIs and text-to-source mappings. Google proxy URLs are resolved via HEAD redirects. Configured via GEMINI_API_KEY or geminiApiKey in config.
  • Gemini Web search: Zero-config web search for users signed into Google in Chrome. Prompt instructs Gemini to cite sources; URLs extracted from markdown response.
  • Gemini extraction fallback: When fetch_content fails (HTTP 403/429, Readability fails, network errors), automatically retries via Gemini URL Context API then Gemini Web extraction. Each has an independent 60s timeout. Handles SPAs, JS-heavy pages, and anti-bot protections.
  • Local video file analysis: fetch_content accepts file paths to video files (MP4, MOV, WebM, AVI, etc.). Detected by path prefix (/, ./, ../, file://), validated by extension and 50MB limit. Two-tier fallback: Gemini API (resumable upload via Files API with proper MIME types, poll-until-active and cleanup) → Gemini Web (free, cookie auth).
  • Video prompt parameter: fetch_content gains optional prompt parameter for asking specific questions about video content. Threads through YouTube and local video extraction. Without prompt, uses default extraction (transcript + visual descriptions).
  • Video thumbnails: YouTube results include the video thumbnail (fetched from img.youtube.com). Local video results include a frame extracted via ffmpeg (at ~1 second). Returned as image content parts alongside text — the agent sees the thumbnail as vision context.
  • Configurable frame extraction: frames parameter (1-12) on fetch_content for pulling visual frames from YouTube or local video. Works in five modes: frames alone (sample across entire video), single timestamp (one frame), single+frames (N frames at 5s intervals), range (default 6 frames), range+frames (N frames across the range). Endpoint-inclusive distribution with 5-second minimum spacing.
  • Video duration in responses: Frame extraction results include the video duration for context.
  • searchProvider config option in ~/.pi/web-search.json for global provider default
  • video config section: enabled, preferredModel, maxSizeMB

Changed

  • PerplexityResponse renamed to SearchResponse (shared interface for all search providers)
  • Extracted HTTP pipeline from extractContent into extractViaHttp for cleaner Gemini fallback orchestration
  • getApiKey(), API_BASE, DEFAULT_MODEL exported from gemini-api.ts for use by search and URL Context modules
  • isPerplexityAvailable() added to perplexity.ts as non-throwing API key check
  • Content-type routing in extract.ts: only text/html and application/xhtml+xml go through Readability; all other text types (text/markdown, application/json, text/csv, etc.) returned directly. Fixes the OpenAI cookbook .md URL that returned "Untitled (30 chars)".
  • Title extraction for non-HTML content: extractTextTitle() pulls from markdown #/## headings, falls back to URL filename
  • Combined yt-dlp --print duration -g call fetches stream URL and duration in a single invocation, reused across all frame extraction paths via streamInfo passthrough
  • Shared helpers in utils.ts (formatSeconds, error mapping) eliminate circular imports and duplication across youtube-extract.ts and video-extract.ts

Fixed

  • fetch_content TUI rendered undefined/undefined URLs during progress updates (renderResult didn't handle isPartial, now shows a progress bar like web_search does)
  • RSC extractor produced malformed markdown for <pre><code> blocks (backticks inside fenced code blocks) -- extremely common on Next.js documentation pages
  • Multi-URL fetch failures rendered in green "success" color even when 0 URLs succeeded (now red)
  • web_search queries parameter described as "parallel" in schema but execution is sequential (changed to "batch"; urls correctly remains "parallel")
  • Proper error propagation for frame extraction: missing binaries (yt-dlp, ffmpeg, ffprobe), private/age-restricted/region-blocked videos, expired stream URLs (403), timestamp-exceeds-duration, and timeouts all produce specific user-facing messages instead of silent nulls
  • isTimeoutError now detects execFileSync timeouts via the killed flag (SIGTERM from timeout was previously unrecognized)
  • Float video durations (e.g. 15913.7s from yt-dlp) no longer produce out-of-range timestamps — durations are floored before computing frame positions
  • parseTimestamp consistently floors results across both bare-number ("90.5" → 90) and colon ("1:30.5" → 90) paths — previously the colon path returned floats
  • YouTube thumbnail assignment no longer sets null on the optional thumbnail field when fetch fails (was a type mismatch; now only assigned on success)

New files

  • gemini-search.ts -- search routing + Gemini Web/API search providers with grounding
  • gemini-url-context.ts -- URL Context API extraction + Gemini Web extraction fallback
  • video-extract.ts -- local video file detection, Gemini Web/API analysis with Files API upload
  • utils.ts -- shared formatting and error helpers for frame extraction

[0.6.0] - 2026-02-02

Added

  • YouTube video understanding in fetch_content via three-tier fallback chain:
    • Gemini Web (primary): reads Chrome session cookies from macOS Keychain + SQLite, authenticates to gemini.google.com, sends YouTube URL via StreamGenerate endpoint. Full visual + audio understanding with timestamps. Zero config needed if signed into Google in Chrome.
    • Gemini API (secondary): direct REST calls with GEMINI_API_KEY. YouTube URLs passed as file_data.file_uri. Configure via GEMINI_API_KEY env var or geminiApiKey in ~/.pi/web-search.json.
    • Perplexity (fallback): uses existing searchWithPerplexity for a topic summary when neither Gemini path is available. Output labeled as "Summary (via Perplexity)" so the agent knows it's not a full transcript.
  • YouTube URL detection for all common formats: /watch?v=, youtu.be/, /shorts/, /live/, /embed/, /v/, m.youtube.com
  • Configurable via ~/.pi/web-search.json under youtube key (enabled, preferredModel)
  • Actionable error messages when extraction fails (directs user to sign into Chrome or set API key)
  • YouTube URLs no longer fall through to HTTP/Readability (which returns garbage); returns error instead

New files

  • chrome-cookies.ts -- macOS Chrome cookie extraction using Node builtins (node:crypto, node:sqlite, child_process)
  • gemini-web.ts -- Gemini Web client ported from surf's gemini-client.cjs (cookie auth, StreamGenerate, model fallback)
  • gemini-api.ts -- Gemini REST API client (generateContent, file upload/processing/cleanup for Phase 2)
  • youtube-extract.ts -- YouTube extraction orchestrator with three-tier fallback and activity logging

[0.5.1] - 2026-02-02

Added

  • Bundled librarian skill -- structured research workflow for open-source libraries with GitHub permalinks, combining fetch_content (cloning), web_search (recent info), and git operations (blame, log, show)

Fixed

  • Session fork event handler was registered as session_branch (non-existent event) instead of session_fork, meaning forks never triggered cleanup (abort pending fetches, clear clone cache, restore session data)
  • API fallback title for tree URLs with a path (e.g. /tree/main/src) now includes the path (owner/repo - src), consistent with clone-based results
  • Removed unnecessary export on getDefaultBranch (only used internally by fetchViaApi)

[0.5.0] - 2026-02-01

Added

  • GitHub repository clone extraction for fetch_content -- detects GitHub code URLs, clones repos to /tmp/pi-github-repos/, and returns actual file contents plus local path for further exploration with read and bash
  • Lightweight API fallback for oversized repos (>350MB) and commit SHA URLs via gh api
  • Clone cache with concurrent request deduplication (second request awaits first's clone)
  • forceClone parameter on fetch_content to override the size threshold
  • Configurable via ~/.pi/web-search.json under githubClone key (enabled, maxRepoSizeMB, cloneTimeoutSeconds, clonePath)
  • Falls back to git clone when gh CLI is unavailable (public repos only)
  • README: GitHub clone documentation with config, flow diagram, and limitations

Changed

  • Refactored extractContent/fetchAllContent signatures from positional timeoutMs to ExtractOptions object
  • Blob/tree fetch titles now include file path (e.g. owner/repo - src/index.ts) for better disambiguation in multi-URL results and TUI

Fixed

  • README: Activity monitor keybinding corrected from Ctrl+Shift+O to Ctrl+Shift+W

[0.4.5] - 2026-02-01

Changed

  • Added package keywords for npm discoverability

[0.4.4] - 2026-02-01

Fixed

  • Adapt execute signatures to pi v0.51.0: reorder signal, onUpdate, ctx parameters across all three tools

[0.4.3] - 2026-01-27

Fixed

  • Google API compatibility: Use StringEnum for recencyFilter to avoid unsupported anyOf/const JSON Schema patterns

[0.4.2] - 2026-01-27

Fixed

  • Single-URL fetches now store content for retrieval via get_search_content (previously only multi-URL)
  • Corrected get_search_content usage syntax in fetch_content help messages

Changed

  • Increased inline content limit from 10K to 30K chars (larger content truncated but fully retrievable)

Added

  • Banner image for README

[0.4.1] - 2026-01-26

Changed

  • Added pi manifest to package.json for pi v0.50.0 package system compliance
  • Added pi-package keyword for npm discoverability

[0.4.0] - 2026-01-19

Added

  • PDF extraction via unpdf - fetches PDFs from URLs and saves as markdown to ~/Downloads/
    • Extracts text, metadata (title, author), page count
    • Supports PDFs up to 20MB (vs 5MB for HTML)
    • Handles arxiv URLs with smart title fallback

Fixed

  • Plain text URL detection now uses hostname check instead of substring match

[0.3.0] - 2026-01-19

Added

  • RSC (React Server Components) content extraction for Next.js App Router pages
    • Parses flight data from <script>self.__next_f.push([...])</script> tags
    • Reconstructs markdown with headings, tables, code blocks, links
    • Handles chunk references and nested components
    • Falls back to RSC extraction when Readability fails
  • Content-type validation rejects binary files (images, PDFs, audio, video, zip)
  • 5MB response size limit (checked via Content-Length header) to prevent memory issues

Fixed

  • fetch_content now handles plain text URLs (raw.githubusercontent.com, gist.githubusercontent.com, any text/plain response) instead of failing with "Could not extract readable content"

[0.2.0] - 2026-01-11

Added

  • Activity monitor widget (Ctrl+Shift+O) showing live request/response activity
    • Displays last 10 API calls and URL fetches with status codes and timing
    • Shows rate limit usage and reset countdown
    • Live updates as requests complete
    • Auto-clears on session switch

Changed

  • Refactored activity tracking into dedicated activity.ts module

[0.1.0] - 2026-01-06

Initial release. Designed for pi v0.37.3.

Added

  • web_search tool - Search via Perplexity AI with synthesized answers and citations
    • Single or batch queries (parallel execution)
    • Recency filter (day/week/month/year)
    • Domain filter (include or exclude)
    • Optional async content fetching with agent notification
  • fetch_content tool - Fetch and extract readable content from URLs
    • Single URL returns content directly
    • Multiple URLs store for retrieval via get_search_content
    • Concurrent fetching (3 max) with 30s timeout
  • get_search_content tool - Retrieve stored search results or fetched content
    • Access by response ID, URL, query, or index
  • /search command - Interactive browser for stored results
  • TUI rendering with progress bars, URL lists, and expandable previews
  • Session-aware storage with 1-hour TTL
  • Rate limiting (10 req/min for Perplexity API)
  • Config file support (~/.pi/web-search.json)
  • Content extraction via Readability + Turndown (max 10k chars)
  • Proper session isolation - pending fetches abort on session switch
  • URL validation before fetch attempts
  • Defensive JSON parsing for API responses