#ai
17 notes
- Jun 26, 2026
baidu/Unlimited-OCR (paper) parses dozens of pages in one forward pass by fixing the real bottleneck in end-to-end OCR.
- The problem: an OCR decoder types out the page token-by-token, and its KV cache grows unbounded with output length — so memory climbs and speed decays the longer the doc, forcing page-by-page loops that wipe memory each step.
- The trick (Reference Sliding Window Attention): split context into two zones with different retention. Vision/prompt tokens stay fully visible and pinned forever (the "source book"); each token attends to only the last 128 of its own outputs (the "last few words you wrote"). KV cache becomes a fixed-size queue, so memory and TPS stay flat regardless of output length.
- The non-obvious part: discarding history improves accuracy (+6% on OmniDocBench, beating Qwen2.5-VL-72B at 3B/0.5B-active) — full attention can diverge on long dense output, and pinned vision tokens never blur.
- Cheap to build: freeze DeepSeek-OCR's encoder, fine-tune only the decoder with the swapped attention.
- Generalizes past OCR: separate what you reference from what you remember.
- Jun 23, 2026
Many asymmetric embedding models need task prefixes on inputs, and skipping them quietly degrades relevance. Each model has its own scheme: nomic (
search_query:/search_document:), E5 (query:/passage:), BGE (a query instruction sentence, bare docs) — not interchangeable. OpenAItext-embedding-3-*andall-MiniLM-L6-v2need none. Whether you add the prefix depends on the serving layer, not the model: raw endpoints (llama.cpp/v1/embeddings, HF TEI, Ollama) send bare text so it's on you, while sentence-transformers (prompt_name="query") and vendor SDKs inject it for you. Addingsearch_query:/search_document:to a nomic-v1.5 call lifted cosine similarity on a real query/doc pair from 0.54 to 0.60 at zero cost. - Jun 15, 2026
Moshi is the best SSH client I've found for iPhone — connects to your laptop's tmux sessions over Mosh, making it rock-solid on mobile networks with roaming support. Perfect for an agent view on the go; I use it to monitor and interact with running pi/Claude sessions from my iPhone without losing the session on network switches.
- Jun 8, 2026
faberic is a crowd-sourced collection of prompts bundeled with a cli to use them via piped interfaces.
- Jun 8, 2026
[karpathy/autoresearch] is a concept/framework that allows one to leverage LLM to run an optimization loops based on a criteria. davebcn87/pi-autoresearch takes this further has built a generic optimization long run loop method on top of pi agent. gemini
- May 29, 2026
Dynamic workflows in Claude Code. Claude dynamically writes orchestration scripts that spawn tens to hundreds of parallel subagents. It plans, decomposes into subtasks, fans out agents, and checks results before returning a coordinated answer. Adversarial agents try to refute findings, iterating until convergence. Enable via
ultracodesetting or ask Claude to "create a workflow". Notable: Jarred Sumner used it to port Bun from Zig to Rust (~750K lines, 11 days). Available on Max, Team, Enterprise plans. Consumes significantly more tokens than typical sessions. - May 9, 2026
Use HTML instead of markdown to effectively plan and review. Claude Code: The Unreasonable Effectiveness of HTML" / X
- May 9, 2026
colbymchenry/codegraph: Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local
- Apr 22, 2026
A good collection of practices on automated AI code reviews by Ankit Jain
- The Scalability Crisis: Manual post-PR review is no longer viable. AI agents have nearly doubled code output, causing human review time to spike by 91%, creating a bottleneck that traditional workflows cannot solve.
- The Upstream Pivot: Human value must shift from reviewing implementation to defining intent. Instead of checking syntax, humans spend their energy writing rigorous specs and acceptance criteria before the code is written, which the machine then uses to self-verify.
- The Swiss-Cheese Defense: Rather than one "perfect" human gate, the model uses a stack of imperfect automated layers. By layering signals like agent competition, deterministic guardrails, and adversarial "red-team" agents, the system catches errors where their individual failure modes don't overlap.
- Mar 24, 2026
pageindex generates a semantic tree-like json index of a lengthy document to allow for reasoning based RAG without the need for vectordb.
- Feb 15, 2026
For generating embedding locally, nomic-embed-text is a large context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks. It has a balance of speed, 8k context, and accuracy for English-centric apps. BGE-M3, Qwen3-Embedding and E5-Small are other alternatives.
- Feb 15, 2026
yichuan-w/LEANN is a RAG focused framework focused on efficient storage with built-in chunking strategies embedding model management and MCP server. gemini
- Feb 15, 2026
K-dense known for using skills to enable deep research has published 140+ skills related to scientic research including literature review, data analysis, etc.
- Feb 6, 2026
Opus 4.6 launch.
- context compaction (beta) and 1M context window, enables longer agentic tasks without loosing context.
- they claim it has found 500 Zero-Day Flaws in open-source projects (yet to see the proofs though)
- agent teams, multiple agent coordinates with a leader agent. https://code.claude.com/docs/en/agent-teams
- Jan 31, 2026
Notes on "How AI assistance impacts the formation of coding skills" Article HN
- AI speeds up coding but reduces deep understanding and mastery
- Juniors (1-3 years experience) showed speed improvements with AI, but 4+ year developers showed no difference
- Modern software work is more about requirements, specs, documentation, and communication than raw coding skill
- Small sample size (n<8) and study design limitations make results questionable
- Takeaways:
- Use AI for high-scoring interaction patterns: Ask conceptual questions and request explanations rather than just code generation
- Adopt AI for documentation and specs: Multiple developers report dramatic improvements in tickets, PRs, and documentation quality
- Be deliberate about learning: If using AI, actively practice explaining concepts and avoid pure copy-paste workflows
- Use AI to reduce grunt work: Let it handle boilerplate, test writing, and repetitive tasks while focusing on architecture and requirements
- The research confirms what many suspected: AI coding assistants create a real trade-off between speed and skill development, but the practical significance is hotly contested. The critical question isn't whether AI reduces learning (it does), but whether deep coding skill remains as valuable as expressing requirements clearly—and whether we're comfortable with a generation of developers who can't function without AI assistance.
- Dec 24, 2025
"LangGraph is an orchestration framework for building stateful multi-agent applications using LLMs. It provides low-level primitives such as nodes and edges, along with built-in features that give developers granular control over agent workflows, memory management and state persistence. This means developers can start with a simple pre-built graph and scale to complex, evolving agent architectures. With support for streaming, advanced context management and resilience patterns like model fallbacks and tool error handling, LangGraph enables you to build robust, production-grade agentic applications. Its graph-based approach ensures predictable, customizable workflows and simplifies debugging and scaling."
- Dec 22, 2025
Notes from Thoughtworks - Technology Radar vol 33
- text-to-sql solutions aren't working as expected
- pnpm, langGraph, and pydantic recommended for adoption