#rag

9 notes

Jul 24, 2026
Embedding pipelines do not feed whole documents straight into the model. They first use the embedding model's tokenizer to measure and split text into token-bounded, overlapping chunks, then embed each chunk and store the vectors. Tokenizer must match the model. Tools like marcelroed/gigatoken accelerate this preparation at corpus scale but do not create embeddings themselves.

ai performance
Jun 23, 2026
Many asymmetric embedding models need task prefixes on inputs, and skipping them quietly degrades relevance. Each model has its own scheme: nomic (search_query:/search_document:), E5 (query:/passage:), BGE (a query instruction sentence, bare docs) — not interchangeable. OpenAI text-embedding-3-* and all-MiniLM-L6-v2 need none. Whether you add the prefix depends on the serving layer, not the model: raw endpoints (llama.cpp /v1/embeddings, HF TEI, Ollama) send bare text so it's on you, while sentence-transformers (prompt_name="query") and vendor SDKs inject it for you. Adding search_query:/search_document: to a nomic-v1.5 call lifted cosine similarity on a real query/doc pair from 0.54 to 0.60 at zero cost.

ai
Mar 24, 2026
pageindex generates a semantic tree-like json index of a lengthy document to allow for reasoning based RAG without the need for vectordb.

ai databases
Feb 15, 2026
For generating embedding locally, nomic-embed-text is a large context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks. It has a balance of speed, 8k context, and accuracy for English-centric apps. BGE-M3, Qwen3-Embedding and E5-Small are other alternatives.

ai
Feb 15, 2026
alibaba/zvec by alibaba is an embedded vector database supports both spare and dense vectors, along with structured data. It can be considered the sqlite of vector databases.

databases
Feb 15, 2026
yichuan-w/LEANN is a RAG focused framework focused on efficient storage with built-in chunking strategies embedding model management and MCP server. gemini

ai
Feb 5, 2026
Vortex a newer format is supported by duckdb promises to be faster for random access and has zero-copy metadata, with better compression. While the support across the board is limited but worth considering for LLM/RAG based uses over parquet.

data databases
Jan 11, 2026
Puzer published a github recommendor that uses semantic embedding from user's github stars all client side, I found some great recommendations which I plan to use:
- pamburus/hl: A fast and powerful log viewer and processor that converts JSON logs or logfmt logs into a clear human-readable format. (⭐2657)
- samwho/spacer: CLI tool to insert spacers when command output stops (⭐1663)
- darrenburns/posting: The modern API client that lives in your terminal. (⭐11134)
- plutov/oq: Terminal OpenAPI Spec viewer (⭐943)
- wey-gu/py-pglite: PGlite wrapper in Python for testing. Test your app with Postgres just as lite as SQLite. (⭐577)
tools databases
Dec 19, 2025
VectorChord is a faster pgvector alternative.

databases

← All tags