#databases

13 notes

Jul 18, 2026
Lobste.rs moved to SQLite, and the HN thread had sharper lessons than the post itself:
- WAL checkpoints only run during "reader gaps." Keep reads/writes continuously overlapping and the WAL file grows unbounded, a DoS vector kgeist reproduced with short-lived reads/writes, not just long-lived readers as the docs imply.
- Writes are fully serialized, one fsync each, yet app-layer batching still gets 100k+ writes/sec.
- Plain rsync on a live .db file can silently corrupt it since it doesn't understand SQLite transactions. Use sqlite3_rsync instead.
- The real weak point is schema migrations: treat SQLite like a KV store, or reach for CQRS, to keep the churn low.
sqlite production
Jun 28, 2026
PostgreSQL UNLOGGED tables skip the write-ahead log, making writes considerably faster than ordinary tables. Trade-off: they are truncated after a crash or unclean shutdown and are not replicated to standbys. hynek/psycache exploits this as a PostgreSQL-native cache replacing Redis — crash truncation is acceptable since it's a cache anyway, and you reuse existing Postgres connections. Interesting idea: port this to Django as a custom BaseCache backend — swap DatabaseCache's regular table for CREATE UNLOGGED TABLE, reuse Django's existing DB connections, and skip Redis entirely. ~200 lines.

postgres
Mar 24, 2026
pageindex generates a semantic tree-like json index of a lengthy document to allow for reasoning based RAG without the need for vectordb.

rag ai
Feb 19, 2026
slingdata-io/sling-cli is a promising tool move/sync data between databases and files, esp. helpful for local testing, ci/cd while able to do stage/sql based transformations.

data tools
Feb 15, 2026
alibaba/zvec by alibaba is an embedded vector database supports both spare and dense vectors, along with structured data. It can be considered the sqlite of vector databases.

rag
Feb 13, 2026
zoocache is a sematic dependency based cache manager, that support in-memory, LMDB or redis backends. Integration with Django looks interesting.

django python
Feb 5, 2026
Vortex a newer format is supported by duckdb promises to be faster for random access and has zero-copy metadata, with better compression. While the support across the board is limited but worth considering for LLM/RAG based uses over parquet.

data rag
Feb 5, 2026
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann manages to be both rigorous technical manual and something approaching a philosophical treatise on the nature of truth, consistency, and trust in distributed systems. [Claude][https://claude.ai/chat/0f58f2f6-bd56-41a1-a785-8267afa5a3d1]
- Foundation. He starts by asking the questions "What do we actually want in our data systems", answers them as - reliability, scalability and maintainability.
  - Reliability. The question isn't whether failure will happen, it will, but the whether the systems can survive them.
  - Scalability. It's not a binary question of whether a system is "scalable" or "not scalable". Ask - "What happens when specific load parameter increase?"
  - Maintainability - most underappreciated of the three, he argues the majority of cost isn't in initial development but in ongoing maintainence.
- The data model wars - the skill isn't in choosing the "best" database but in understanding which tradeoffs matter for your specific problem.
- Storage engines. Two major approaches to read & write data from disk, neither is universally better.
  - Log-structured storage (like LSM-trees) optimizes for writes, every write is appended and the merged/compacted later.
  - Page-oriented storage (like B-trees) optimizes for reads, data is stored in fixed-sized block, which then get updated in-place, more like filing cabinet with each document has a designated slot.
- Instead of asking "how do I build this?" ask "what does it mean for this to work correctly?"
book architecture
Jan 12, 2026
Distributed Key-value store -- https://etcd.io/
Jan 11, 2026
Puzer published a github recommendor that uses semantic embedding from user's github stars all client side, I found some great recommendations which I plan to use:
- pamburus/hl: A fast and powerful log viewer and processor that converts JSON logs or logfmt logs into a clear human-readable format. (⭐2657)
- samwho/spacer: CLI tool to insert spacers when command output stops (⭐1663)
- darrenburns/posting: The modern API client that lives in your terminal. (⭐11134)
- plutov/oq: Terminal OpenAPI Spec viewer (⭐943)
- wey-gu/py-pglite: PGlite wrapper in Python for testing. Test your app with Postgres just as lite as SQLite. (⭐577)
tools rag
Jan 7, 2026
"Django Postgres Migration Tools - add-on for safer and more scalable migrations in django.

django
Dec 22, 2025
Notes from Thoughtworks - Technology Radar vol 33
- text-to-sql solutions aren't working as expected
- pnpm, langGraph, and pydantic recommended for adoption
architecture ai
Dec 19, 2025
VectorChord is a faster pgvector alternative.

rag

← All tags