Local_ai

AI: Local Models Memory

2026-03-28·354 words·2 mins

I thought running multiple AI agents locally meant loading the model into memory once per agent. Turns out model weights are shared - only the KV-cache scales per request. Here’s what I learned and how consolidating to a single model improved everything.